Chapter 3 - Advanced Request-Reply Patterns covered advanced uses of ZeroMQ’s request-reply pattern with working examples. This chapter looks at the general question of reliability and builds a set of reliable messaging patterns on top of ZeroMQ’s core request-reply pattern.
In this chapter, we focus heavily on user-space request-reply patterns, reusable models that help you design your own ZeroMQ architectures:
The Lazy Pirate pattern: reliable request-reply from the client side
The Simple Pirate pattern: reliable request-reply using load balancing
The Paranoid Pirate pattern: reliable request-reply with heartbeating
The Majordomo pattern: service-oriented reliable queuing
The Titanic pattern: disk-based/disconnected reliable queuing
The Binary Star pattern: primary-backup server failover
The Freelance pattern: brokerless reliable request-reply
Most people who speak of “reliability” don’t really know what they mean. We can only define reliability in terms of failure. That is, if we can handle a certain set of well-defined and understood failures, then we are reliable with respect to those failures. No more, no less. So let’s look at the possible causes of failure in a distributed ZeroMQ application, in roughly descending order of probability:
Application code is the worst offender. It can crash and exit, freeze and stop responding to input, run too slowly for its input, exhaust all memory, and so on.
System code–such as brokers we write using ZeroMQ–can die for the same reasons as application code. System code should be more reliable than application code, but it can still crash and burn, and especially run out of memory if it tries to queue messages for slow clients.
Message queues can overflow, typically in system code that has learned to deal brutally with slow clients. When a queue overflows, it starts to discard messages. So we get “lost” messages.
Networks can fail (e.g., WiFi gets switched off or goes out of range). ZeroMQ will automatically reconnect in such cases, but in the meantime, messages may get lost.
Hardware can fail and take with it all the processes running on that box.
Networks can fail in exotic ways, e.g., some ports on a switch may die and those parts of the network become inaccessible.
Entire data centers can be struck by lightning, earthquakes, fire, or more mundane power or cooling failures.
To make a software system fully reliable against all of these possible failures is an enormously difficult and expensive job and goes beyond the scope of this book.
Because the first five cases in the above list cover 99.9% of real world requirements outside large companies (according to a highly scientific study I just ran, which also told me that 78% of statistics are made up on the spot, and moreover never to trust a statistic that we didn’t falsify ourselves), that’s what we’ll examine. If you’re a large company with money to spend on the last two cases, contact my company immediately! There’s a large hole behind my beach house waiting to be converted into an executive swimming pool.
So to make things brutally simple, reliability is “keeping things working properly when code freezes or crashes”, a situation we’ll shorten to “dies”. However, the things we want to keep working properly are more complex than just messages. We need to take each core ZeroMQ messaging pattern and see how to make it work (if we can) even when code dies.
Let’s take them one-by-one:
Request-reply: if the server dies (while processing a request), the client can figure that out because it won’t get an answer back. Then it can give up in a huff, wait and try again later, find another server, and so on. As for the client dying, we can brush that off as “someone else’s problem” for now.
Pub-sub: if the client dies (having gotten some data), the server doesn’t know about it. Pub-sub doesn’t send any information back from client to server. But the client can contact the server out-of-band, e.g., via request-reply, and ask, “please resend everything I missed”. As for the server dying, that’s out of scope for here. Subscribers can also self-verify that they’re not running too slowly, and take action (e.g., warn the operator and die) if they are.
Pipeline: if a worker dies (while working), the ventilator doesn’t know about it. Pipelines, like the grinding gears of time, only work in one direction. But the downstream collector can detect that one task didn’t get done, and send a message back to the ventilator saying, “hey, resend task 324!” If the ventilator or collector dies, whatever upstream client originally sent the work batch can get tired of waiting and resend the whole lot. It’s not elegant, but system code should really not die often enough to matter.
In this chapter we’ll focus just on request-reply, which is the low-hanging fruit of reliable messaging.
The basic request-reply pattern (a REQ client socket doing a blocking send/receive to a REP server socket) scores low on handling the most common types of failure. If the server crashes while processing the request, the client just hangs forever. If the network loses the request or the reply, the client hangs forever.
Request-reply is still much better than TCP, thanks to ZeroMQ’s ability to reconnect peers silently, to load balance messages, and so on. But it’s still not good enough for real work. The only case where you can really trust the basic request-reply pattern is between two threads in the same process where there’s no network or separate server process to die.
However, with a little extra work, this humble pattern becomes a good basis for real work across a distributed network, and we get a set of reliable request-reply (RRR) patterns that I like to call the Pirate patterns (you’ll eventually get the joke, I hope).
There are, in my experience, roughly three ways to connect clients to servers. Each needs a specific approach to reliability:
Multiple clients talking directly to a single server. Use case: a single well-known server to which clients need to talk. Types of failure we aim to handle: server crashes and restarts, and network disconnects.
Multiple clients talking to a broker proxy that distributes work to multiple workers. Use case: service-oriented transaction processing. Types of failure we aim to handle: worker crashes and restarts, worker busy looping, worker overload, queue crashes and restarts, and network disconnects.
Multiple clients talking to multiple servers with no intermediary proxies. Use case: distributed services such as name resolution. Types of failure we aim to handle: service crashes and restarts, service busy looping, service overload, and network disconnects.
Each of these approaches has its trade-offs and often you’ll mix them. We’ll look at all three in detail.
We can get very simple reliable request-reply with some changes to the client. We call this the Lazy Pirate pattern. Rather than doing a blocking receive, we:
Poll the REQ socket and receive from it only when it’s sure a reply has arrived.
Resend a request, if no reply has arrived within a timeout period.
Abandon the transaction if there is still no reply after several requests.
If you try to use a REQ socket in anything other than a strict send/receive fashion, you’ll get an error (technically, the REQ socket implements a small finite-state machine to enforce the send/receive ping-pong, and so the error code is called “EFSM”). This is slightly annoying when we want to use REQ in a pirate pattern, because we may send several requests before getting a reply.
The pretty good brute force solution is to close and reopen the REQ socket after an error:
#include<czmq.h>#define REQUEST_TIMEOUT 2500 // msecs, (>1000!)
#define REQUEST_RETRIES 3 // Before we abandon
#define SERVER_ENDPOINT "tcp://localhost:5555"
intmain()
{
zsock_t *client = zsock_new_req(SERVER_ENDPOINT);
printf("I: Connecting to server...\n");
assert(client);
int sequence = 0;
int retries_left = REQUEST_RETRIES;
printf("Entering while loop...\n");
while(retries_left) // interrupt needs to be handled
{
// We send a request, then we get a reply
char request[10];
sprintf(request, "%d", ++sequence);
zstr_send(client, request);
int expect_reply = 1;
while(expect_reply)
{
printf("Expecting reply....\n");
zmq_pollitem_t items [] = {{zsock_resolve(client), 0, ZMQ_POLLIN, 0}};
printf("After polling\n");
int rc = zmq_poll(items, 1, REQUEST_TIMEOUT * ZMQ_POLL_MSEC);
printf("Polling Done.. \n");
if (rc == -1)
break; // Interrupted
// Here we process a server reply and exit our loop if the
// reply is valid. If we didn't get a reply we close the
// client socket, open it again and resend the request. We
// try a number times before finally abandoning:
if (items[0].revents & ZMQ_POLLIN)
{
// We got a reply from the server, must match sequence
char *reply = zstr_recv(client);
if(!reply)
break; // interrupted
if (atoi(reply) == sequence)
{
printf("I: server replied OK (%s)\n", reply);
retries_left=REQUEST_RETRIES;
expect_reply = 0;
}
else
{
printf("E: malformed reply from server: %s\n", reply);
}
free(reply);
}
else
{
if(--retries_left == 0)
{
printf("E: Server seems to be offline, abandoning\n");
break;
}
else
{
printf("W: no response from server, retrying...\n");
zsock_destroy(&client);
printf("I: reconnecting to server...\n");
client = zsock_new_req(SERVER_ENDPOINT);
zstr_send(client, request);
}
}
}
zsock_destroy(&client);
return0;
}
}
lpclient: Lazy Pirate client in C++
//
// Lazy Pirate client
// Use zmq_poll to do a safe request-reply
// To run, start piserver and then randomly kill/restart it
//
#include"zhelpers.hpp"#include<sstream>#define REQUEST_TIMEOUT 2500 // msecs, (> 1000!)
#define REQUEST_RETRIES 3 // Before we abandon
// Helper function that returns a new configured socket
// connected to the Hello World server
//
static zmq::socket_t * s_client_socket (zmq::context_t & context) {
std::cout << "I: connecting to server..." << std::endl;
zmq::socket_t * client = new zmq::socket_t (context, ZMQ_REQ);
client->connect ("tcp://localhost:5555");
// Configure socket to not wait at close time
int linger = 0;
client->setsockopt (ZMQ_LINGER, &linger, sizeof (linger));
return client;
}
intmain () {
zmq::context_t context (1);
zmq::socket_t * client = s_client_socket (context);
int sequence = 0;
int retries_left = REQUEST_RETRIES;
while (retries_left) {
std::stringstream request;
request << ++sequence;
s_send (*client, request.str());
sleep (1);
bool expect_reply = true;
while (expect_reply) {
// Poll socket for a reply, with timeout
zmq::pollitem_t items[] = {
{ *client, 0, ZMQ_POLLIN, 0 } };
zmq::poll (&items[0], 1, REQUEST_TIMEOUT);
// If we got a reply, process it
if (items[0].revents & ZMQ_POLLIN) {
// We got a reply from the server, must match sequence
std::string reply = s_recv (*client);
if (atoi (reply.c_str ()) == sequence) {
std::cout << "I: server replied OK (" << reply << ")" << std::endl;
retries_left = REQUEST_RETRIES;
expect_reply = false;
}
else {
std::cout << "E: malformed reply from server: " << reply << std::endl;
}
}
elseif (--retries_left == 0) {
std::cout << "E: server seems to be offline, abandoning" << std::endl;
expect_reply = false;
break;
}
else {
std::cout << "W: no response from server, retrying..." << std::endl;
// Old socket will be confused; close it and open a new one
delete client;
client = s_client_socket (context);
// Send request again, on new socket
s_send (*client, request.str());
}
}
}
delete client;
return0;
}
program lpclient;
//
// Lazy Pirate client
// Use zmq_poll to do a safe request-reply
// To run, start lpserver and then randomly kill/restart it
// @author Varga Balazs <bb.varga@gmail.com>
//
{$APPTYPE CONSOLE}
uses
SysUtils
, zmqapi
;
const
REQUEST_TIMEOUT = 2500; // msecs, (> 1000!)
REQUEST_RETRIES = 3; // Before we abandon
SERVER_ENDPOINT = 'tcp://localhost:5555';
var
ctx: TZMQContext;
client: TZMQSocket;
sequence,
retries_left,
expect_reply: Integer;
request,
reply: Utf8String;
poller: TZMQPoller;
begin
ctx := TZMQContext.create;
Writeln( 'I: connecting to server...' );
client := ctx.Socket( stReq );
client.Linger := 0;
client.connect( SERVER_ENDPOINT );
poller := TZMQPoller.Create( true );
poller.Register( client, [pePollIn] );
sequence := 0;
retries_left := REQUEST_RETRIES;
while ( retries_left > 0 ) and not ctx.Terminated do
try
// We send a request, then we work to get a reply
inc( sequence );
request := Format( '%d', [sequence] );
client.send( request );
expect_reply := 1;
while ( expect_reply > 0 ) do
begin
// Poll socket for a reply, with timeout
poller.poll( REQUEST_TIMEOUT );
// Here we process a server reply and exit our loop if the
// reply is valid. If we didn't a reply we close the client
// socket and resend the request. We try a number of times
// before finally abandoning:
if pePollIn in poller.PollItem[0].revents then
begin
// We got a reply from the server, must match sequence
client.recv( reply );
if StrToInt( reply ) = sequence then
begin
Writeln( Format( 'I: server replied OK (%s)', [reply] ) );
retries_left := REQUEST_RETRIES;
expect_reply := 0;
end else
Writeln( Format( 'E: malformed reply from server: %s', [ reply ] ) );
end else
begin
dec( retries_left );
if retries_left = 0 then
begin
Writeln( 'E: server seems to be offline, abandoning' );
break;
end else
begin
Writeln( 'W: no response from server, retrying...' );
// Old socket is confused; close it and open a new one
poller.Deregister( client, [pePollIn] );
client.Free;
Writeln( 'I: reconnecting to server...' );
client := ctx.Socket( stReq );
client.Linger := 0;
client.connect( SERVER_ENDPOINT );
poller.Register( client, [pePollIn] );
// Send request again, on new socket
client.send( request );
end;
end;
end;
except
end;
poller.Free;
ctx.Free;
end.
// Lazy Pirate client
// Use zmq_poll to do a safe request-reply
// To run, start lpserver and then randomly kill/restart it
//
// Author: iano <scaly.iano@gmail.com>
// Based on C example
package main
import (
"fmt"
zmq "github.com/alecthomas/gozmq""strconv""time"
)
const (
REQUEST_TIMEOUT = time.Duration(2500) * time.Millisecond
REQUEST_RETRIES = 3
SERVER_ENDPOINT = "tcp://localhost:5555"
)
funcmain() {
context, _ := zmq.NewContext()
defer context.Close()
fmt.Println("I: Connecting to server...")
client, _ := context.NewSocket(zmq.REQ)
client.Connect(SERVER_ENDPOINT)
for sequence, retriesLeft := 1, REQUEST_RETRIES; retriesLeft > 0; sequence++ {
fmt.Printf("I: Sending (%d)\n", sequence)
client.Send([]byte(strconv.Itoa(sequence)), 0)
for expectReply := true; expectReply; {
// Poll socket for a reply, with timeout
items := zmq.PollItems{
zmq.PollItem{Socket: client, Events: zmq.POLLIN},
}
if _, err := zmq.Poll(items, REQUEST_TIMEOUT); err != nil {
panic(err) // Interrupted
}
// .split process server reply
// Here we process a server reply and exit our loop if the
// reply is valid. If we didn't a reply we close the client
// socket and resend the request. We try a number of times
// before finally abandoning:
if item := items[0]; item.REvents&zmq.POLLIN != 0 {
// We got a reply from the server, must match sequence
reply, err := item.Socket.Recv(0)
if err != nil {
panic(err) // Interrupted
}
if replyInt, err := strconv.Atoi(string(reply)); replyInt == sequence && err == nil {
fmt.Printf("I: Server replied OK (%s)\n", reply)
retriesLeft = REQUEST_RETRIES
expectReply = false
} else {
fmt.Printf("E: Malformed reply from server: %s", reply)
}
} elseif retriesLeft--; retriesLeft == 0 {
fmt.Println("E: Server seems to be offline, abandoning")
client.SetLinger(0)
client.Close()
break
} else {
fmt.Println("W: No response from server, retrying...")
// Old socket is confused; close it and open a new one
client.SetLinger(0)
client.Close()
client, _ = context.NewSocket(zmq.REQ)
client.Connect(SERVER_ENDPOINT)
fmt.Printf("I: Resending (%d)\n", sequence)
// Send request again, on new socket
client.Send([]byte(strconv.Itoa(sequence)), 0)
}
}
}
}
lpclient: Lazy Pirate client in Haskell
{--
Lazy Pirate client in Haskell
--}moduleMainwhereimportSystem.ZMQ4.MonadicimportSystem.Random (randomRIO)
importSystem.Exit (exitSuccess)
importControl.Monad (forever, when)
importControl.Concurrent (threadDelay)
importData.ByteString.Char8 (pack, unpack)
requestRetries=3requestTimeout_ms=2500serverEndpoint="tcp://localhost:5555"main::IO()main=
runZMQ $ do
liftIO $ putStrLn "I: Connecting to server"
client <- socket Req
connect client serverEndpoint
sendServer 1 requestRetries client
sendServer::Int->Int->Socket z Req->ZMQ z ()sendServer_0_= return ()sendServer seq retries client =do
send client [] (pack $ show seq)
pollServer seq retries client
pollServer::Int->Int->Socket z Req->ZMQ z ()pollServer seq retries client =do
[evts] <- poll requestTimeout_ms [Sock client [In] Nothing]
ifIn `elem` evts
thendo
reply <- receive client
if (read . unpack $ reply) == seq
thendo
liftIO $ putStrLn $ "I: Server replied OK " ++ (unpack reply)
sendServer (seq+1) requestRetries client
elsedo
liftIO $ putStrLn $ "E: malformed reply from server: " ++ (unpack reply)
pollServer seq retries client
elseif retries == 0then liftIO $ putStrLn "E: Server seems to be offline, abandoning" >> exitSuccess
elsedo
liftIO $ putStrLn $ "W: No response from server, retrying..."
client' <- socket Req
connect client' serverEndpoint
send client' [] (pack $ show seq)
pollServer seq (retries-1) client'
lpclient: Lazy Pirate client in Haxe
package ;
importhaxe.Stack;
importneko.Lib;
importorg.zeromq.ZContext;
importorg.zeromq.ZFrame;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQException;
importorg.zeromq.ZMQPoller;
importorg.zeromq.ZSocket;
/**
* Lazy Pirate client
* Use zmq_poll to do a safe request-reply
* To run, start lpserver and then randomly kill / restart it.
*
* @see http://zguide.zeromq.org/page:all#Client-side-Reliability-Lazy-Pirate-Pattern
*/class LPClient
{
privatestaticinlinevar REQUEST_TIMEOUT = 2500; // msecs, (> 1000!)privatestaticinlinevar REQUEST_RETRIES = 3; // Before we abandonprivatestaticinlinevar SERVER_ENDPOINT = "tcp://localhost:5555";
publicstaticfunctionmain() {
Lib.println("** LPClient (see: http://zguide.zeromq.org/page:all#Client-side-Reliability-Lazy-Pirate-Pattern)");
var ctx:ZContext = new ZContext();
Lib.println("I: connecting to server ...");
var client = ctx.createSocket(ZMQ_REQ);
if (client == null)
return;
client.connect(SERVER_ENDPOINT);
var sequence = 0;
var retries_left = REQUEST_RETRIES;
var poller = new ZMQPoller();
while (retries_left > 0 && !ZMQ.isInterrupted()) {
// We send a request, then we work to get a replyvar request = Std.string(++sequence);
ZFrame.newStringFrame(request).send(client);
var expect_reply = true;
while (expect_reply) {
poller.registerSocket(client, ZMQ.ZMQ_POLLIN());
// Poll socket for a reply, with timeouttry {
var res = poller.poll(REQUEST_TIMEOUT * 1000);
} catch (e:ZMQException) {
trace("ZMQException #:" + e.errNo + ", str:" + e.str());
trace (Stack.toString(Stack.exceptionStack()));
ctx.destroy();
return;
}
// If we got a reply, process itif (poller.pollin(1)) {
// We got a reply from the server, must match sequencevar replyFrame = ZFrame.recvFrame(client);
if (replyFrame == null)
break; // Interruptedif (Std.parseInt(replyFrame.toString()) == sequence) {
Lib.println("I: server replied OK (" + sequence + ")");
retries_left = REQUEST_RETRIES;
expect_reply = false;
} else
Lib.println("E: malformed reply from server: " + replyFrame.toString());
replyFrame.destroy();
} elseif (--retries_left == 0) {
Lib.println("E: server seems to be offline, abandoning");
break;
} else {
Lib.println("W: no response from server, retrying...");
// Old socket is confused, close it and open a new one
ctx.destroySocket(client);
Lib.println("I: reconnecting to server...");
client = ctx.createSocket(ZMQ_REQ);
client.connect(SERVER_ENDPOINT);
// Send request again, on new socket
ZFrame.newStringFrame(request).send(client);
}
poller.unregisterAllSockets();
}
}
ctx.destroy();
}
}
lpclient: Lazy Pirate client in Java
packageguide;
importorg.zeromq.SocketType;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
//
// Lazy Pirate client
// Use zmq_poll to do a safe request-reply
// To run, start lpserver and then randomly kill/restart it
//
publicclasslpclient
{
privatefinalstaticint REQUEST_TIMEOUT = 2500; // msecs, (> 1000!)
privatefinalstaticint REQUEST_RETRIES = 3; // Before we abandon
privatefinalstatic String SERVER_ENDPOINT = "tcp://localhost:5555";
publicstaticvoidmain(String[] argv)
{
try (ZContext ctx = new ZContext()) {
System.out.println("I: connecting to server");
Socket client = ctx.createSocket(SocketType.REQ);
assert (client != null);
client.connect(SERVER_ENDPOINT);
Poller poller = ctx.createPoller(1);
poller.register(client, Poller.POLLIN);
int sequence = 0;
int retriesLeft = REQUEST_RETRIES;
while (retriesLeft > 0 && !Thread.currentThread().isInterrupted()) {
// We send a request, then we work to get a reply
String request = String.format("%d", ++sequence);
client.send(request);
int expect_reply = 1;
while (expect_reply > 0) {
// Poll socket for a reply, with timeout
int rc = poller.poll(REQUEST_TIMEOUT);
if (rc == -1)
break; // Interrupted
// Here we process a server reply and exit our loop if the
// reply is valid. If we didn't a reply we close the client
// socket and resend the request. We try a number of times
// before finally abandoning:
if (poller.pollin(0)) {
// We got a reply from the server, must match
// getSequence
String reply = client.recvStr();
if (reply == null)
break; // Interrupted
if (Integer.parseInt(reply) == sequence) {
System.out.printf(
"I: server replied OK (%s)\n", reply
);
retriesLeft = REQUEST_RETRIES;
expect_reply = 0;
}
else System.out.printf(
"E: malformed reply from server: %s\n", reply
);
}
elseif (--retriesLeft == 0) {
System.out.println(
"E: server seems to be offline, abandoning\n"
);
break;
}
else {
System.out.println(
"W: no response from server, retrying\n"
);
// Old socket is confused; close it and open a new one
poller.unregister(client);
ctx.destroySocket(client);
System.out.println("I: reconnecting to server\n");
client = ctx.createSocket(SocketType.REQ);
client.connect(SERVER_ENDPOINT);
poller.register(client, Poller.POLLIN);
// Send request again, on new socket
client.send(request);
}
}
}
}
}
}
---- Lazy Pirate client-- Use zmq_poll to do a safe request-reply-- To run, start lpserver and then randomly kill/restart it---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"zmq"
require"zmq.poller"
require"zhelpers"local REQUEST_TIMEOUT = 2500-- msecs, (> 1000!)local REQUEST_RETRIES = 3-- Before we abandon-- Helper function that returns a new configured socket-- connected to the Hello World server--localfunctions_client_socket(context)
printf ("I: connecting to server...\n")
local client = context:socket(zmq.REQ)
client:connect("tcp://localhost:5555")
-- Configure socket to not wait at close time
client:setopt(zmq.LINGER, 0)
return client
end
s_version_assert (2, 1)
local context = zmq.init(1)
local client = s_client_socket (context)
local sequence = 0local retries_left = REQUEST_RETRIES
local expect_reply = truelocal poller = zmq.poller(1)
localfunctionclient_cb()
-- We got a reply from the server, must match sequence--local reply = assert(client:recv(zmq.NOBLOCK))local reply = client:recv()
if (tonumber(reply) == sequence) then
printf ("I: server replied OK (%s)\n", reply)
retries_left = REQUEST_RETRIES
expect_reply = falseelse
printf ("E: malformed reply from server: %s\n", reply)
endend
poller:add(client, zmq.POLLIN, client_cb)
while (retries_left > 0) do
sequence = sequence + 1-- We send a request, then we work to get a replylocal request = string.format("%d", sequence)
client:send(request)
expect_reply = truewhile (expect_reply) do-- Poll socket for a reply, with timeoutlocal cnt = assert(poller:poll(REQUEST_TIMEOUT * 1000))
-- Check if there was no replyif (cnt == 0) then
retries_left = retries_left - 1if (retries_left == 0) then
printf ("E: server seems to be offline, abandoning\n")
breakelse
printf ("W: no response from server, retrying...\n")
-- Old socket is confused; close it and open a new one
poller:remove(client)
client:close()
client = s_client_socket (context)
poller:add(client, zmq.POLLIN, client_cb)
-- Send request again, on new socket
client:send(request)
endendendend
client:close()
context:term()
# Lazy Pirate client in Perl# Use poll to do a safe request-reply# To run, start lpserver.pl then randomly kill/restart itusestrict;
usewarnings;
usev5.10;
useZMQ::FFI;
useZMQ::FFI::Constantsqw(ZMQ_REQ);
useEV;
my$REQUEST_TIMEOUT = 2500; # msecsmy$REQUEST_RETRIES = 3; # Before we abandonmy$SERVER_ENDPOINT = 'tcp://localhost:5555';
my$ctx = ZMQ::FFI->new();
say 'I: connecting to server...';
my$client = $ctx->socket(ZMQ_REQ);
$client->connect($SERVER_ENDPOINT);
my$sequence = 0;
my$retries_left = $REQUEST_RETRIES;
REQUEST_LOOP:
while ($retries_left) {
# We send a request, then we work to get a replymy$request = ++$sequence;
$client->send($request);
my$expect_reply = 1;
RETRY_LOOP:
while ($expect_reply) {
# Poll socket for a reply, with timeoutEV::once $client->get_fd, EV::READ, $REQUEST_TIMEOUT / 1000, sub {
my ($revents) = @_;
# Here we process a server reply and exit our loop if the# reply is valid. If we didn't get a reply we close the client# socket and resend the request. We try a number of times# before finally abandoning:if ($revents == EV::READ) {
while ($client->has_pollin) {
# We got a reply from the server, must match sequencemy$reply = $client->recv();
if ($reply == $sequence) {
say "I: server replied OK ($reply)";
$retries_left = $REQUEST_RETRIES;
$expect_reply = 0;
}
else {
say "E: malformed reply from server: $reply";
}
}
}
elsif (--$retries_left == 0) {
say 'E: server seems to be offline, abandoning';
}
else {
say "W: no response from server, retrying...";
# Old socket is confused; close it and open a new one$client->close;
say "reconnecting to server...";
$client = $ctx->socket(ZMQ_REQ);
$client->connect($SERVER_ENDPOINT);
# Send request again, on new socket$client->send($request);
}
};
last RETRY_LOOP if$retries_left == 0;
EV::run;
}
}
lpclient: Lazy Pirate client in PHP
<?php/*
* Lazy Pirate client
* Use zmq_poll to do a safe request-reply
* To run, start lpserver and then randomly kill/restart it
*
* @author Ian Barber <ian(dot)barber(at)gmail(dot)com>
*/
define("REQUEST_TIMEOUT", 2500); // msecs, (> 1000!)
define("REQUEST_RETRIES", 3); // Before we abandon
/*
* Helper function that returns a new configured socket
* connected to the Hello World server
*/functionclient_socket(ZMQContext $context)
{
echo"I: connecting to server...", PHP_EOL;
$client = new ZMQSocket($context,ZMQ::SOCKET_REQ);
$client->connect("tcp://localhost:5555");
// Configure socket to not wait at close time
$client->setSockOpt(ZMQ::SOCKOPT_LINGER, 0);
return$client;
}
$context = new ZMQContext();
$client = client_socket($context);
$sequence = 0;
$retries_left = REQUEST_RETRIES;
$read = $write = array();
while ($retries_left) {
// We send a request, then we work to get a reply
$client->send(++$sequence);
$expect_reply = true;
while ($expect_reply) {
// Poll socket for a reply, with timeout
$poll = new ZMQPoll();
$poll->add($client, ZMQ::POLL_IN);
$events = $poll->poll($read, $write, REQUEST_TIMEOUT);
// If we got a reply, process it
if ($events > 0) {
// We got a reply from the server, must match sequence
$reply = $client->recv();
if (intval($reply) == $sequence) {
printf ("I: server replied OK (%s)%s", $reply, PHP_EOL);
$retries_left = REQUEST_RETRIES;
$expect_reply = false;
} else {
printf ("E: malformed reply from server: %s%s", $reply, PHP_EOL);
}
} elseif (--$retries_left == 0) {
echo"E: server seems to be offline, abandoning", PHP_EOL;
break;
} else {
echo"W: no response from server, retrying...", PHP_EOL;
// Old socket will be confused; close it and open a new one
$client = client_socket($context);
// Send request again, on new socket
$client->send($sequence);
}
}
}
lpclient: Lazy Pirate client in Python
## Lazy Pirate client# Use zmq_poll to do a safe request-reply# To run, start lpserver and then randomly kill/restart it## Author: Daniel Lundin <dln(at)eintr(dot)org>#importitertoolsimportloggingimportsysimportzmq
logging.basicConfig(format="%(levelname)s: %(message)s", level=logging.INFO)
REQUEST_TIMEOUT = 2500
REQUEST_RETRIES = 3
SERVER_ENDPOINT = "tcp://localhost:5555"
context = zmq.Context()
logging.info("Connecting to server…")
client = context.socket(zmq.REQ)
client.connect(SERVER_ENDPOINT)
for sequence in itertools.count():
request = str(sequence).encode()
logging.info("Sending (%s)", request)
client.send(request)
retries_left = REQUEST_RETRIES
while True:
if (client.poll(REQUEST_TIMEOUT) & zmq.POLLIN) != 0:
reply = client.recv()
ifint(reply) == sequence:
logging.info("Server replied OK (%s)", reply)
retries_left = REQUEST_RETRIES
breakelse:
logging.error("Malformed reply from server: %s", reply)
continue
retries_left -= 1
logging.warning("No response from server")
# Socket is confused. Close and remove it.
client.setsockopt(zmq.LINGER, 0)
client.close()
if retries_left == 0:
logging.error("Server seems to be offline, abandoning")
sys.exit()
logging.info("Reconnecting to server…")
# Create new connection
client = context.socket(zmq.REQ)
client.connect(SERVER_ENDPOINT)
logging.info("Resending (%s)", request)
client.send(request)
## Lazy Pirate client
# Use zmq_poll to do a safe request-reply
# To run, start lpserver and then randomly kill/restart it
#
package require zmq
set REQUEST_TIMEOUT 2500;# msecs, (> 1000!)
set REQUEST_RETRIES 3;# Before we abandon
set SERVER_ENDPOINT "tcp://localhost:5555"zmq context context
puts"I: connecting to server..."zmq socket client context REQ
client connect $SERVER_ENDPOINTset sequence 0set retries_left $REQUEST_RETRIESwhile{$retries_left}{# We send a request, then we work to get a reply
client send [incr sequence]set expect_reply 1while{$expect_reply}{# Poll socket for a reply, with timeout
set rpoll_set [zmq poll {{client{POLLIN}}}$REQUEST_TIMEOUT]# If we got a reply, process it
if{[llength$rpoll_set] && [lindex$rpoll_set00]eq"client"}{# We got a reply from the server, must match sequence
set reply [client recv]if{$replyeq$sequence}{puts"I: server replied OK ($reply)"set retries_left $REQUEST_RETRIESset expect_reply 0}else{puts"E: malformed reply from server: $reply"}}elseif{[incr retries_left -1] <= 0}{puts"E: server seems to be offline, abandoning"set retries_left 0break}else{puts"W: no response from server, retrying..."# Old socket is confused; close it and open a new one
client close
puts"I: connecting to server..."zmq socket client context REQ
client connect $SERVER_ENDPOINT# Send request again, on new socket
client send $sequence}}}client close
context term
program lpserver;
//
// Lazy Pirate server
// Binds REQ socket to tcp://*:5555
// Like hwserver except:
// - echoes request as-is
// - randomly runs slowly, or exits to simulate a crash.
// @author Varga Balazs <bb.varga@gmail.com>
//
{$APPTYPE CONSOLE}
uses
SysUtils
, zmqapi
;
var
context: TZMQContext;
server: TZMQSocket;
cycles: Integer;
request: Utf8String;
begin
Randomize;
context := TZMQContext.create;
server := context.socket( stRep );
server.bind( 'tcp://*:5555' );
cycles := 0;
while not context.Terminated do
try
server.recv( request );
inc( cycles );
// Simulate various problems, after a few cycles
if ( cycles > 3 ) and ( random(3) = 0) then
begin
Writeln( 'I: simulating a crash' );
break;
end else
if ( cycles > 3 ) and ( random(3) = 0 ) then
begin
Writeln( 'I: simulating CPU overload' );
sleep(2000);
end;
Writeln( Format( 'I: normal request (%s)', [request] ) );
sleep (1000); // Do some heavy work
server.send( request );
except
end;
context.Free;
end.
---- Lazy Pirate server-- Binds REQ socket to tcp://*:5555-- Like hwserver except:-- - echoes request as-is-- - randomly runs slowly, or exits to simulate a crash.---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"zmq"
require"zhelpers"
math.randomseed(os.time())
local context = zmq.init(1)
local server = context:socket(zmq.REP)
server:bind("tcp://*:5555")
local cycles = 0whiletruedolocal request = server:recv()
cycles = cycles + 1-- Simulate various problems, after a few cyclesif (cycles > 3and randof (3) == 0) then
printf("I: simulating a crash\n")
breakelseif (cycles > 3and randof (3) == 0) then
printf("I: simulating CPU overload\n")
s_sleep(2000)
end
printf("I: normal request (%s)\n", request)
s_sleep(1000) -- Do some heavy work
server:send(request)
end
server:close()
context:term()
# Lazy Pirate server in Perl# Binds REQ socket to tcp://*:5555# Like hwserver except:# - echoes request as-is# - randomly runs slowly, or exits to simulate a crash.usestrict;
usewarnings;
usev5.10;
useZMQ::FFI;
useZMQ::FFI::Constantsqw(ZMQ_REP);
my$context = ZMQ::FFI->new();
my$server = $context->socket(ZMQ_REP);
$server->bind('tcp://*:5555');
my$cycles = 0;
SERVER_LOOP:
while (1) {
my$request = $server->recv();
$cycles++;
# Simulate various problems, after a few cyclesif ($cycles > 3 && int(rand(3)) == 0) {
say "I: simulating a crash";
last SERVER_LOOP;
}
elsif ($cycles > 3 && int(rand(3)) == 0) {
say "I: simulating CPU overload";
sleep2;
}
say "I: normal request ($request)";
sleep1; # Do some heavy work$server->send($request);
}
lpserver: Lazy Pirate server in PHP
<?php/*
* Lazy Pirate server
* Binds REQ socket to tcp://*:5555
* Like hwserver except:
* - echoes request as-is
* - randomly runs slowly, or exits to simulate a crash.
*
* @author Ian Barber <ian(dot)barber(at)gmail(dot)com>
*/$context = new ZMQContext();
$server = new ZMQSocket($context, ZMQ::SOCKET_REP);
$server->bind("tcp://*:5555");
$cycles = 0;
while (true) {
$request = $server->recv();
$cycles++;
// Simulate various problems, after a few cycles
if ($cycles > 3 && rand(0, 3) == 0) {
echo"I: simulating a crash", PHP_EOL;
break;
} elseif ($cycles > 3 && rand(0, 3) == 0) {
echo"I: simulating CPU overload", PHP_EOL;
sleep(5);
}
printf ("I: normal request (%s)%s", $request, PHP_EOL);
sleep(1); // Do some heavy work
$server->send($request);
}
lpserver: Lazy Pirate server in Python
## Lazy Pirate server# Binds REQ socket to tcp://*:5555# Like hwserver except:# - echoes request as-is# - randomly runs slowly, or exits to simulate a crash.## Author: Daniel Lundin <dln(at)eintr(dot)org>#fromrandomimport randint
importitertoolsimportloggingimporttimeimportzmq
logging.basicConfig(format="%(levelname)s: %(message)s", level=logging.INFO)
context = zmq.Context()
server = context.socket(zmq.REP)
server.bind("tcp://*:5555")
for cycles in itertools.count():
request = server.recv()
# Simulate various problems, after a few cyclesif cycles > 3and randint(0, 3) == 0:
logging.info("Simulating a crash")
breakelif cycles > 3and randint(0, 3) == 0:
logging.info("Simulating CPU overload")
time.sleep(2)
logging.info("Normal request (%s)", request)
time.sleep(1) # Do some heavy work
server.send(request)
#!/usr/bin/env ruby# Author: Han Holl <han.holl@pobox.com>require'rubygems'require'zmq'classLPServerdefinitialize(connect)
@ctx = ZMQ::Context.new(1)
@socket = @ctx.socket(ZMQ::REP)
@socket.bind(connect)
enddefrunbeginloopdo
rsl = yield @socket.recv
@socket.send rsl
endensure
@socket.close
@ctx.close
endendendif $0 == __FILE__
cycles = 0srandLPServer.new(ARGV[0] || "tcp://*:5555").run do |request|
cycles += 1if cycles > 3ifrand(3) == 0puts"I: simulating a crash"breakelsifrand(3) == 0puts"I: simulating CPU overload"sleep(3)
endendputs"I: normal request (#{request})"sleep(1)
request
endend
lpserver: Lazy Pirate server in Rust
userand::{thread_rng,Rng};usestd::time::Duration;fnmain(){letcontext=zmq::Context::new();letserver=context.socket(zmq::REP).unwrap();server.bind("tcp://*:5555").unwrap();letmuti=0;loop{i+=1;letrequest=server.recv_msg(0).unwrap();println!("Got Request: {request:?}");server.send(request,0).unwrap();std::thread::sleep(Duration::from_secs(1));if(i>3)&&(thread_rng().gen_range(0..3)==0){// simulate a crash
println!("Oh no! Server crashed.");break;}if(i>3)&&(thread_rng().gen_range(0..3)==0){// simulate overload
println!("Server is busy.");std::thread::sleep(Duration::from_secs(2));}}}
lpserver: Lazy Pirate server in Scala
importorg.zeromq.ZMQ;
importjava.util.Random;
/*
* Lazy Pirate server
* @author Zac Li
* @email zac.li.cmu@gmail.com
*/objectlpserver{
def main (args :Array[String]) {
val rand =newRandom(System.nanoTime())
val context =ZMQ.context(1)
val server = context.socket(ZMQ.REP)
server.bind("tcp://*:5555")
val cycles =0;
while (true) {
val request = server.recvStr()
cycles++
// Simulate various problems, after a few cycles
if (cycles > 3 && rand.nextInt(3) == 0) {
println("I: simulating a crash")
break
} elseif (cycles > 3 && rand.nextInt(3) == 0) {
println("I: simulating CPU overload")
Thread.sleep(2000)
}
println(f"I: normal request (%s)\n", request)
Thread.sleep(1000)
server.send(request)
}
server close()
context term()
}
}
lpserver: Lazy Pirate server in Tcl
## Lazy Pirate server
# Binds REQ socket to tcp://*:5555
# Like hwserver except:
# - echoes request as-is
# - randomly runs slowly, or exits to simulate a crash.
#
package require zmq
expr{srand([pid])}zmq context context
zmq socket server context REP
server bind "tcp://*:5555"set cycles 0while{1}{set request [server recv]incr cycles
# Simulate various problems, after a few cycles
if{$cycles > 3 && int(rand()*3) == 0}{puts"I: simulating a crash"break;}elseif{$cycles > 3 && int(rand()*3) == 0}{puts"I: simulating CPU overload"after2000}puts"I: normal request ($request)"after1000;# Do some heavy work
server send $request}server close
context term
To run this test case, start the client and the server in two console windows. The server will randomly misbehave after a few messages. You can check the client’s response. Here is typical output from the server:
I: normal request (1)
I: normal request (2)
I: normal request (3)
I: simulating CPU overload
I: normal request (4)
I: simulating a crash
And here is the client’s response:
I: connecting to server...
I: server replied OK (1)
I: server replied OK (2)
I: server replied OK (3)
W: no response from server, retrying...
I: connecting to server...
W: no response from server, retrying...
I: connecting to server...
E: server seems to be offline, abandoning
The client sequences each message and checks that replies come back exactly in order: that no requests or replies are lost, and no replies come back more than once, or out of order. Run the test a few times until you’re convinced that this mechanism actually works. You don’t need sequence numbers in a production application; they just help us trust our design.
The client uses a REQ socket, and does the brute force close/reopen because REQ sockets impose that strict send/receive cycle. You might be tempted to use a DEALER instead, but it would not be a good decision. First, it would mean emulating the secret sauce that REQ does with envelopes (if you’ve forgotten what that is, it’s a good sign you don’t want to have to do it). Second, it would mean potentially getting back replies that you didn’t expect.
Handling failures only at the client works when we have a set of clients talking to a single server. It can handle a server crash, but only if recovery means restarting that same server. If there’s a permanent error, such as a dead power supply on the server hardware, this approach won’t work. Because the application code in servers is usually the biggest source of failures in any architecture, depending on a single server is not a great idea.
So, pros and cons:
Pro: simple to understand and implement.
Pro: works easily with existing client and server application code.
Pro: ZeroMQ automatically retries the actual reconnection until it works.
Con: doesn’t failover to backup or alternate servers.
Our second approach extends the Lazy Pirate pattern with a queue proxy that lets us talk, transparently, to multiple servers, which we can more accurately call “workers”. We’ll develop this in stages, starting with a minimal working model, the Simple Pirate pattern.
In all these Pirate patterns, workers are stateless. If the application requires some shared state, such as a shared database, we don’t know about it as we design our messaging framework. Having a queue proxy means workers can come and go without clients knowing anything about it. If one worker dies, another takes over. This is a nice, simple topology with only one real weakness, namely the central queue itself, which can become a problem to manage, and a single point of failure.
Figure 48 - The Simple Pirate Pattern
The basis for the queue proxy is the load balancing broker from
Chapter 3 - Advanced Request-Reply Patterns. What is the very minimum we need to do to handle dead or blocked workers? Turns out, it’s surprisingly little. We already have a retry mechanism in the client. So using the load balancing pattern will work pretty well. This fits with ZeroMQ’s philosophy that we can extend a peer-to-peer pattern like request-reply by plugging naive proxies in the middle.
We don’t need a special client; we’re still using the Lazy Pirate client. Here is the queue, which is identical to the main task of the load balancing broker:
// Simple Pirate broker
// This is identical to load-balancing pattern, with no reliability
// mechanisms. It depends on the client for recovery. Runs forever.
#include"czmq.h"#define WORKER_READY "\001" // Signals worker is ready
intmain (void)
{
zctx_t *ctx = zctx_new ();
void *frontend = zsocket_new (ctx, ZMQ_ROUTER);
void *backend = zsocket_new (ctx, ZMQ_ROUTER);
zsocket_bind (frontend, "tcp://*:5555"); // For clients
zsocket_bind (backend, "tcp://*:5556"); // For workers
// Queue of available workers
zlist_t *workers = zlist_new ();
// The body of this example is exactly the same as lbbroker2.
// .skip
while (true) {
zmq_pollitem_t items [] = {
{ backend, 0, ZMQ_POLLIN, 0 },
{ frontend, 0, ZMQ_POLLIN, 0 }
};
// Poll frontend only if we have available workers
int rc = zmq_poll (items, zlist_size (workers)? 2: 1, -1);
if (rc == -1)
break; // Interrupted
// Handle worker activity on backend
if (items [0].revents & ZMQ_POLLIN) {
// Use worker identity for load-balancing
zmsg_t *msg = zmsg_recv (backend);
if (!msg)
break; // Interrupted
zframe_t *identity = zmsg_unwrap (msg);
zlist_append (workers, identity);
// Forward message to client if it's not a READY
zframe_t *frame = zmsg_first (msg);
if (memcmp (zframe_data (frame), WORKER_READY, 1) == 0)
zmsg_destroy (&msg);
else
zmsg_send (&msg, frontend);
}
if (items [1].revents & ZMQ_POLLIN) {
// Get client request, route to first available worker
zmsg_t *msg = zmsg_recv (frontend);
if (msg) {
zmsg_wrap (msg, (zframe_t *) zlist_pop (workers));
zmsg_send (&msg, backend);
}
}
}
// When we're done, clean up properly
while (zlist_size (workers)) {
zframe_t *frame = (zframe_t *) zlist_pop (workers);
zframe_destroy (&frame);
}
zlist_destroy (&workers);
zctx_destroy (&ctx);
return0;
// .until
}
spqueue: Simple Pirate queue in C++
//
// Simple Pirate queue
// This is identical to the LRU pattern, with no reliability mechanisms
// at all. It depends on the client for recovery. Runs forever.
//
// Andreas Hoelzlwimmer <andreas.hoelzlwimmer@fh-hagenberg.at
#include"zmsg.hpp"#include<queue>#define MAX_WORKERS 100
intmain (void)
{
s_version_assert (2, 1);
// Prepare our context and sockets
zmq::context_t context(1);
zmq::socket_t frontend (context, ZMQ_ROUTER);
zmq::socket_t backend (context, ZMQ_ROUTER);
frontend.bind("tcp://*:5555"); // For clients
backend.bind("tcp://*:5556"); // For workers
// Queue of available workers
std::queue<std::string> worker_queue;
while (1) {
zmq::pollitem_t items [] = {
{ backend, 0, ZMQ_POLLIN, 0 },
{ frontend, 0, ZMQ_POLLIN, 0 }
};
// Poll frontend only if we have available workers
if (worker_queue.size())
zmq::poll (items, 2, -1);
else
zmq::poll (items, 1, -1);
// Handle worker activity on backend
if (items [0].revents & ZMQ_POLLIN) {
zmsg zm(backend);
//zmsg_t *zmsg = zmsg_recv (backend);
// Use worker address for LRU routing
assert (worker_queue.size() < MAX_WORKERS);
worker_queue.push(zm.unwrap());
// Return reply to client if it's not a READY
if (strcmp (zm.address(), "READY") == 0)
zm.clear();
else
zm.send (frontend);
}
if (items [1].revents & ZMQ_POLLIN) {
// Now get next client request, route to next worker
zmsg zm(frontend);
// REQ socket in worker needs an envelope delimiter
zm.wrap(worker_queue.front().c_str(), "");
zm.send(backend);
// Dequeue and drop the next worker address
worker_queue.pop();
}
}
// We never exit the main loop
return0;
}
program spqueue;
//
// Simple Pirate broker
// This is identical to load-balancing pattern, with no reliability
// mechanisms. It depends on the client for recovery. Runs forever.
// @author Varga Balazs <bb.varga@gmail.com>
//
{$APPTYPE CONSOLE}
uses
SysUtils
, zmqapi
;
const
WORKER_READY = '\001'; // Signals worker is ready
var
ctx: TZMQContext;
frontend,
backend: TZMQSocket;
workers: TZMQMsg;
poller: TZMQPoller;
pc: Integer;
msg: TZMQMsg;
identity,
frame: TZMQFrame;
begin
ctx := TZMQContext.create;
frontend := ctx.Socket( stRouter );
backend := ctx.Socket( stRouter );
frontend.bind( 'tcp://*:5555' ); // For clients
backend.bind( 'tcp://*:5556' ); // For workers
// Queue of available workers
workers := TZMQMsg.create;
poller := TZMQPoller.Create( true );
poller.Register( backend, [pePollIn] );
poller.Register( frontend, [pePollIn] );
// The body of this example is exactly the same as lbbroker2.
while not ctx.Terminated do
try
// Poll frontend only if we have available workers
if workers.size > 0 then
pc := 2
else
pc := 1;
poller.poll( 1000, pc );
// Handle worker activity on backend
if pePollIn in poller.PollItem[0].revents then
begin
// Use worker identity for load-balancing
backend.recv( msg );
identity := msg.unwrap;
workers.add( identity );
// Forward message to client if it's not a READY
frame := msg.first;
if frame.asUtf8String = WORKER_READY then
begin
msg.Free;
msg := nil;
end else
frontend.send( msg );
end;
if pePollIn in poller.PollItem[1].revents then
begin
// Get client request, route to first available worker
frontend.recv( msg );
msg.wrap( workers.pop );
backend.send( msg );
end;
except
end;
workers.Free;
ctx.Free;
end.
package ;
importhaxe.Stack;
importneko.Lib;
importorg.zeromq.ZFrame;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQSocket;
importorg.zeromq.ZMQPoller;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMsg;
importorg.zeromq.ZMQException;
/**
* Simple Pirate queue
* This is identical to the LRU pattern, with no reliability mechanisms
* at all. It depends on the client for recovery. Runs forever.
*
* @see http://zguide.zeromq.org/page:all#Basic-Reliable-Queuing-Simple-Pirate-Pattern
*/class SPQueue
{
// Signals workers are readyprivatestaticinlinevar LRU_READY:String = String.fromCharCode(1);
publicstaticfunctionmain() {
Lib.println("** SPQueue (see: http://zguide.zeromq.org/page:all#Basic-Reliable-Queuing-Simple-Pirate-Pattern)");
// Prepare our context and socketsvar context:ZContext = new ZContext();
var frontend:ZMQSocket = context.createSocket(ZMQ_ROUTER);
var backend:ZMQSocket = context.createSocket(ZMQ_ROUTER);
frontend.bind("tcp://*:5555"); // For clients
backend.bind("tcp://*:5556"); // For workers// Queue of available workersvar workerQueue:List<ZFrame> = new List<ZFrame>();
var poller:ZMQPoller = new ZMQPoller();
poller.registerSocket(backend, ZMQ.ZMQ_POLLIN());
while (true) {
poller.unregisterSocket(frontend);
if (workerQueue.length > 0) {
// Only poll frontend if there is at least 1 worker ready to do work
poller.registerSocket(frontend, ZMQ.ZMQ_POLLIN());
}
try {
poller.poll( -1 );
} catch (e:ZMQException) {
if (ZMQ.isInterrupted())
break;
trace("ZMQException #:" + e.errNo + ", str:" + e.str());
trace (Stack.toString(Stack.exceptionStack()));
}
if (poller.pollin(1)) {
// Use worker address for LRU routingvar msg = ZMsg.recvMsg(backend);
if (msg == null)
break; // Interruptedvar address = msg.unwrap();
workerQueue.add(address);
// Forward message to client if it's not a READYvar frame = msg.first();
if (frame.streq(LRU_READY))
msg.destroy();
else
msg.send(frontend);
}
if (poller.pollin(2)) {
// Get client request, route to first available workervar msg = ZMsg.recvMsg(frontend);
if (msg != null) {
msg.wrap(workerQueue.pop());
msg.send(backend);
}
}
}
// When we're done, clean up properlyfor (f in workerQueue) {
f.destroy();
}
context.destroy();
}
}
spqueue: Simple Pirate queue in Java
packageguide;
importjava.util.ArrayList;
importorg.zeromq.*;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
//
// Simple Pirate queue
// This is identical to load-balancing pattern, with no reliability mechanisms
// at all. It depends on the client for recovery. Runs forever.
//
publicclassspqueue
{
privatefinalstatic String WORKER_READY = "\001"; // Signals worker is ready
publicstaticvoidmain(String[] args)
{
try (ZContext ctx = new ZContext()) {
Socket frontend = ctx.createSocket(SocketType.ROUTER);
Socket backend = ctx.createSocket(SocketType.ROUTER);
frontend.bind("tcp://*:5555"); // For clients
backend.bind("tcp://*:5556"); // For workers
// Queue of available workers
ArrayList<ZFrame> workers = new ArrayList<ZFrame>();
Poller poller = ctx.createPoller(2);
poller.register(backend, Poller.POLLIN);
poller.register(frontend, Poller.POLLIN);
// The body of this example is exactly the same as lruqueue2.
while (true) {
boolean workersAvailable = workers.size() > 0;
int rc = poller.poll(-1);
// Poll frontend only if we have available workers
if (rc == -1)
break; // Interrupted
// Handle worker activity on backend
if (poller.pollin(0)) {
// Use worker address for LRU routing
ZMsg msg = ZMsg.recvMsg(backend);
if (msg == null)
break; // Interrupted
ZFrame address = msg.unwrap();
workers.add(address);
// Forward message to client if it's not a READY
ZFrame frame = msg.getFirst();
if (new String(frame.getData(), ZMQ.CHARSET).equals(WORKER_READY))
msg.destroy();
else msg.send(frontend);
}
if (workersAvailable && poller.pollin(1)) {
// Get client request, route to first available worker
ZMsg msg = ZMsg.recvMsg(frontend);
if (msg != null) {
msg.wrap(workers.remove(0));
msg.send(backend);
}
}
}
// When we're done, clean up properly
while (workers.size() > 0) {
ZFrame frame = workers.remove(0);
frame.destroy();
}
workers.clear();
}
}
}
---- Simple Pirate queue-- This is identical to the LRU pattern, with no reliability mechanisms-- at all. It depends on the client for recovery. Runs forever.---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"zmq"
require"zmq.poller"
require"zhelpers"
require"zmsg"local tremove = table.remove
local MAX_WORKERS = 100
s_version_assert (2, 1)
-- Prepare our context and socketslocal context = zmq.init(1)
local frontend = context:socket(zmq.ROUTER)
local backend = context:socket(zmq.ROUTER)
frontend:bind("tcp://*:5555"); -- For clients
backend:bind("tcp://*:5556"); -- For workers-- Queue of available workerslocal worker_queue = {}
local is_accepting = falselocal poller = zmq.poller(2)
localfunctionfrontend_cb()
-- Now get next client request, route to next workerlocal msg = zmsg.recv (frontend)
-- Dequeue a worker from the queue.local worker = tremove(worker_queue, 1)
msg:wrap(worker, "")
msg:send(backend)
if (#worker_queue == 0) then-- stop accepting work from clients, when no workers are available.
poller:remove(frontend)
is_accepting = falseendend-- Handle worker activity on backend
poller:add(backend, zmq.POLLIN, function()
local msg = zmsg.recv(backend)
-- Use worker address for LRU routing
worker_queue[#worker_queue + 1] = msg:unwrap()
-- start accepting client requests, if we are not already doing so.ifnot is_accepting then
is_accepting = true
poller:add(frontend, zmq.POLLIN, frontend_cb)
end-- Forward message to client if it's not a READYif (msg:address() ~= "READY") then
msg:send(frontend)
endend)
-- start poller's event loop
poller:start()
-- We never exit the main loop
<?php/*
* Simple Pirate queue
* This is identical to the LRU pattern, with no reliability mechanisms
* at all. It depends on the client for recovery. Runs forever.
*
* @author Ian Barber <ian(dot)barber(at)gmail(dot)com>
*/include'zmsg.php';
define("MAX_WORKERS", 100);
// Prepare our context and sockets
$context = new ZMQContext();
$frontend = $context->getSocket(ZMQ::SOCKET_ROUTER);
$backend = $context->getSocket(ZMQ::SOCKET_ROUTER);
$frontend->bind("tcp://*:5555"); // For clients
$backend->bind("tcp://*:5556"); // For workers
// Queue of available workers
$available_workers = 0;
$worker_queue = array();
$read = $write = array();
while (true) {
$poll = new ZMQPoll();
$poll->add($backend, ZMQ::POLL_IN);
// Poll frontend only if we have available workers
if ($available_workers) {
$poll->add($frontend, ZMQ::POLL_IN);
}
$events = $poll->poll($read, $write);
foreach ($readas$socket) {
$zmsg = new Zmsg($socket);
$zmsg->recv();
// Handle worker activity on backend
if ($socket === $backend) {
// Use worker address for LRU routing
assert($available_workers < MAX_WORKERS);
array_push($worker_queue, $zmsg->unwrap());
$available_workers++;
// Return reply to client if it's not a READY
if ($zmsg->address() != "READY") {
$zmsg->set_socket($frontend)->send();
}
} elseif ($socket === $frontend) {
// Now get next client request, route to next worker
// REQ socket in worker needs an envelope delimiter
// Dequeue and drop the next worker address
$zmsg->wrap(array_shift($worker_queue), "");
$zmsg->set_socket($backend)->send();
$available_workers--;
}
}
// We never exit the main loop
}
spqueue: Simple Pirate queue in Python
## Simple Pirate queue# This is identical to the LRU pattern, with no reliability mechanisms# at all. It depends on the client for recovery. Runs forever.## Author: Daniel Lundin <dln(at)eintr(dot)org>#importzmq
LRU_READY = "\x01"
context = zmq.Context(1)
frontend = context.socket(zmq.ROUTER) # ROUTER
backend = context.socket(zmq.ROUTER) # ROUTER
frontend.bind("tcp://*:5555") # For clients
backend.bind("tcp://*:5556") # For workers
poll_workers = zmq.Poller()
poll_workers.register(backend, zmq.POLLIN)
poll_both = zmq.Poller()
poll_both.register(frontend, zmq.POLLIN)
poll_both.register(backend, zmq.POLLIN)
workers = []
while True:
if workers:
socks = dict(poll_both.poll())
else:
socks = dict(poll_workers.poll())
# Handle worker activity on backendif socks.get(backend) == zmq.POLLIN:
# Use worker address for LRU routing
msg = backend.recv_multipart()
ifnot msg:
break
address = msg[0]
workers.append(address)
# Everything after the second (delimiter) frame is reply
reply = msg[2:]
# Forward message to client if it's not a READYif reply[0] != LRU_READY:
frontend.send_multipart(reply)
if socks.get(frontend) == zmq.POLLIN:
# Get client request, route to first available worker
msg = frontend.recv_multipart()
request = [workers.pop(0), ''.encode()] + msg
backend.send_multipart(request)
## Simple Pirate queue
# This is identical to the LRU pattern, with no reliability mechanisms
# at all. It depends on the client for recovery. Runs forever.
#
package require zmq
set LRU_READY "READY";# Signals worker is ready
# Prepare our context and sockets
zmq context context
zmq socket frontend context ROUTER
zmq socket backend context ROUTER
frontend bind "tcp://*:5555";# For clients
backend bind "tcp://*:5556";# For workers
# Queue of available workers
set workers {}while{1}{if{[llength$workers]}{set poll_set [list[list backend [list POLLIN]][list frontend [list POLLIN]]]}else{set poll_set [list[list backend [list POLLIN]]]}set rpoll_set [zmq poll $poll_set -1]foreach rpoll $rpoll_set{switch[lindex$rpoll0]{backend{# Use worker address for LRU routing
set msg [zmsg recv backend]set address [zmsg unwrap msg]lappend workers $address# Forward message to client if it's not a READY
if{[lindex$msg0]ne$LRU_READY}{zmsg send frontend $msg}}frontend{# Get client request, route to first available worker
set msg [zmsg recv frontend]set workers [lassign$workers worker]set msg [zmsg wrap $msg$worker]zmsg send backend $msg}}}}frontend close
backend close
context term
---- Simple Pirate worker-- Connects REQ socket to tcp://*:5556-- Implements worker part of LRU queueing---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"zmq"
require"zmsg"
math.randomseed(os.time())
local context = zmq.init(1)
local worker = context:socket(zmq.REQ)
-- Set random identity to make tracing easierlocal identity = string.format("%04X-%04X", randof (0x10000), randof (0x10000))
worker:setopt(zmq.IDENTITY, identity)
worker:connect("tcp://localhost:5556")
-- Tell queue we're ready for work
printf ("I: (%s) worker ready\n", identity)
worker:send("READY")
local cycles = 0whiletruedolocal msg = zmsg.recv (worker)
-- Simulate various problems, after a few cycles
cycles = cycles + 1if (cycles > 3and randof (5) == 0) then
printf ("I: (%s) simulating a crash\n", identity)
breakelseif (cycles > 3and randof (5) == 0) then
printf ("I: (%s) simulating CPU overload\n", identity)
s_sleep (5000)
end
printf ("I: (%s) normal reply - %s\n",
identity, msg:body())
s_sleep (1000) -- Do some heavy work
msg:send(worker)
end
worker:close()
context:term()
## Simple Pirate worker
# Connects REQ socket to tcp://*:5556
# Implements worker part of LRU queueing
#
package require zmq
set LRU_READY "READY";# Signals worker is ready
expr{srand([pid])}zmq context context
zmq socket worker context REQ
# Set random identity to make tracing easier
set identity [format"%04X-%04X"[expr{int(rand()*0x10000)}][expr{int(rand()*0x10000)}]]worker setsockopt IDENTITY $identityworker connect "tcp://localhost:5556"# Tell broker we're ready for work
puts"I: ($identity) worker ready"worker send $LRU_READYset cycles 0while{1}{set msg [zmsg recv worker]# Simulate various problems, after a few cycles
incr cycles
if{$cycles > 3 && [expr{int(rand()*5)}] == 0}{puts"I: ($identity) simulating a crash"break}elseif{$cycles > 3 && [expr{int(rand()*5)}] == 0}{puts"I: ($identity) simulating CPU overload"after3000}puts"I: ($identity) normal reply"after1000;# Do some heavy work
zmsg send worker $msg}worker close
context term
To test this, start a handful of workers, a Lazy Pirate client, and the queue, in any order. You’ll see that the workers eventually all crash and burn, and the client retries and then gives up. The queue never stops, and you can restart workers and clients ad nauseam. This model works with any number of clients and workers.
The Simple Pirate Queue pattern works pretty well, especially because it’s just a combination of two existing patterns. Still, it does have some weaknesses:
It’s not robust in the face of a queue crash and restart. The client will recover, but the workers won’t. While ZeroMQ will reconnect workers’ sockets automatically, as far as the newly started queue is concerned, the workers haven’t signaled ready, so don’t exist. To fix this, we have to do heartbeating from queue to worker so that the worker can detect when the queue has gone away.
The queue does not detect worker failure, so if a worker dies while idle, the queue can’t remove it from its worker queue until the queue sends it a request. The client waits and retries for nothing. It’s not a critical problem, but it’s not nice. To make this work properly, we do heartbeating from worker to queue, so that the queue can detect a lost worker at any stage.
We’ll fix these in a properly pedantic Paranoid Pirate Pattern.
We previously used a REQ socket for the worker. For the Paranoid Pirate worker, we’ll switch to a DEALER socket. This has the advantage of letting us send and receive messages at any time, rather than the lock-step send/receive that REQ imposes. The downside of DEALER is that we have to do our own envelope management (re-read
Chapter 3 - Advanced Request-Reply Patterns for background on this concept).
We’re still using the Lazy Pirate client. Here is the Paranoid Pirate queue proxy:
package ;
importhaxe.Stack;
importneko.Lib;
importorg.zeromq.ZFrame;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQSocket;
importorg.zeromq.ZMQPoller;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMsg;
importorg.zeromq.ZMQException;
/**
* Paranoid Pirate Queue
*
* @see http://zguide.zeromq.org/page:all#Robust-Reliable-Queuing-Paranoid-Pirate-Pattern
*
* Author: rsmith (at) rsbatechnology (dot) co (dot) uk
*/class PPQueue
{
privatestaticinlinevar HEARTBEAT_LIVENESS = 3;
privatestaticinlinevar HEARTBEAT_INTERVAL = 1000; // msecsprivatestaticinlinevar PPP_READY = String.fromCharCode(1);
privatestaticinlinevar PPP_HEARTBEAT = String.fromCharCode(2);
publicstaticfunctionmain() {
Lib.println("** PPQueue (see: http://zguide.zeromq.org/page:all#Robust-Reliable-Queuing-Paranoid-Pirate-Pattern)");
// Prepare our context and socketsvar context:ZContext = new ZContext();
var frontend:ZMQSocket = context.createSocket(ZMQ_ROUTER);
var backend:ZMQSocket = context.createSocket(ZMQ_ROUTER);
frontend.bind("tcp://*:5555"); // For clients
backend.bind("tcp://*:5556"); // For workerQueue// Queue of available workerQueuevar workerQueue = new WorkerQueue(HEARTBEAT_LIVENESS, HEARTBEAT_INTERVAL);
// Send out heartbeats at regular intervalsvar heartbeatAt = Date.now().getTime() + HEARTBEAT_INTERVAL;
var poller = new ZMQPoller();
while (true) {
poller.unregisterAllSockets();
poller.registerSocket(backend, ZMQ.ZMQ_POLLIN());
// Only poll frontend clients if we have at least one worker to do stuffif (workerQueue.size() > 0) {
poller.registerSocket(frontend, ZMQ.ZMQ_POLLIN());
}
try {
poller.poll(HEARTBEAT_INTERVAL * 1000);
} catch (e:ZMQException) {
if (ZMQ.isInterrupted())
break;
trace("ZMQException #:" + e.errNo + ", str:" + e.str());
trace (Stack.toString(Stack.exceptionStack()));
}
// Handle worker activityif (poller.pollin(1)) {
// use worker addressFrame for LRU routingvar msg = ZMsg.recvMsg(backend);
if (msg == null)
break; // Interrupted// Any sign of life from worker means it's readyvar addressFrame = msg.unwrap();
var identity = addressFrame.toString();
// Validate control message, or return reply to clientif (msg.size() == 1) {
var frame = msg.first();
if (frame.streq(PPP_READY)) {
workerQueue.delete(identity);
workerQueue.append(addressFrame,identity);
} elseif (frame.streq(PPP_HEARTBEAT)) {
workerQueue.refresh(identity);
} else {
Lib.println("E: invalid message from worker");
Lib.println(msg.toString());
}
msg.destroy();
} else {
msg.send(frontend);
workerQueue.append(addressFrame, identity);
}
}
if (poller.pollin(2)) {
// Now get next client request, route to next workervar msg = ZMsg.recvMsg(frontend);
if (msg == null)
break; // Interruptedvar worker = workerQueue.dequeue();
msg.push(worker.addressFrame.duplicate());
msg.send(backend);
}
// Send heartbeats to idle workerQueue if it's timeif (Date.now().getTime() >= heartbeatAt) {
for ( w in workerQueue) {
var msg = new ZMsg();
msg.add(w.addressFrame.duplicate()); // Add a duplicate of the stored worker addressFrame frame,// to prevent w.addressFrame ZFrame object from being destroyed when msg is sent
msg.addString(PPP_HEARTBEAT);
msg.send(backend);
}
heartbeatAt = Date.now().getTime() + HEARTBEAT_INTERVAL;
}
workerQueue.purge();
}
// When we're done, clean up properly
context.destroy();
}
}
typedef WorkerT = {
addressFrame:ZFrame,
identity:String,
expiry:Float // in msecs since 1 Jan 1970
};
/**
* Internal class managing a queue of workerQueue
*/privateclass WorkerQueue {
// Stores hash of worker heartbeat expiries, keyed by worker identityprivatevar queue:List<WorkerT>;
privatevar heartbeatLiveness:Int;
privatevar heartbeatInterval:Int;
/**
* Constructor
* @param liveness
* @param interval
*/publicfunctionnew(liveness:Int, interval:Int) {
queue = new List<WorkerT>();
heartbeatLiveness = liveness;
heartbeatInterval = interval;
}
// Implement Iterable typedef signaturepublicfunctioniterator():Iterator<WorkerT> {
return queue.iterator();
}
/**
* Insert worker at end of queue, reset expiry
* Worker must not already be in queue
* @param identity
*/publicfunctionappend(addressFrame:ZFrame,identity:String) {
if (get(identity) != null)
Lib.println("E: duplicate worker identity " + identity);
else
queue.add({addressFrame:addressFrame, identity:identity, expiry:generateExpiry()});
}
/**
* Remove worker from queue, if present
* @param identity
*/publicfunctiondelete(identity:String) {
var w = get(identity);
if (w != null) {
queue.remove(w);
}
}
publicfunctionrefresh(identity:String) {
var w = get(identity);
if (w == null)
Lib.println("E: worker " + identity + " not ready");
else
w.expiry = generateExpiry();
}
/**
* Pop next worker off queue, return WorkerT
* @param identity
*/publicfunctiondequeue():WorkerT {
return queue.pop();
}
/**
* Look for & kill expired workerQueue
*/publicfunctionpurge() {
for (w in queue) {
if (Date.now().getTime() > w.expiry) {
queue.remove(w);
}
}
}
/**
* Return the size of this worker Queue
* @return
*/publicfunctionsize():Int {
return queue.length;
}
/**
* Returns a WorkerT anon object if exists in the queue, else null
* @param identity
* @return
*/privatefunctionget(identity:String):WorkerT {
for (w in queue) {
if (w.identity == identity)
return w;
}
returnnull; // nothing found
}
privateinlinefunctiongenerateExpiry():Float {
return Date.now().getTime() + heartbeatInterval * heartbeatLiveness;
}
}
ppqueue: Paranoid Pirate queue in Java
packageguide;
importjava.util.ArrayList;
importjava.util.Iterator;
importorg.zeromq.*;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
//
// Paranoid Pirate queue
//
publicclassppqueue
{
privatefinalstaticint HEARTBEAT_LIVENESS = 3; // 3-5 is reasonable
privatefinalstaticint HEARTBEAT_INTERVAL = 1000; // msecs
// Paranoid Pirate Protocol constants
privatefinalstatic String PPP_READY = "\001"; // Signals worker is ready
privatefinalstatic String PPP_HEARTBEAT = "\002"; // Signals worker heartbeat
// Here we define the worker class; a structure and a set of functions that
// as constructor, destructor, and methods on worker objects:
privatestaticclassWorker
{
ZFrame address; // Address of worker
String identity; // Printable identity
long expiry; // Expires at this time
protectedWorker(ZFrame address)
{
this.address = address;
identity = new String(address.getData(), ZMQ.CHARSET);
expiry = System.currentTimeMillis() + HEARTBEAT_INTERVAL * HEARTBEAT_LIVENESS;
}
// The ready method puts a worker to the end of the ready list:
protectedvoidready(ArrayList<Worker> workers)
{
Iterator<Worker> it = workers.iterator();
while (it.hasNext()) {
Worker worker = it.next();
if (identity.equals(worker.identity)) {
it.remove();
break;
}
}
workers.add(this);
}
// The next method returns the next available worker address:
protectedstatic ZFrame next(ArrayList<Worker> workers)
{
Worker worker = workers.remove(0);
assert (worker != null);
ZFrame frame = worker.address;
return frame;
}
// The purge method looks for and kills expired workers. We hold workers
// from oldest to most recent, so we stop at the first alive worker:
protectedstaticvoidpurge(ArrayList<Worker> workers)
{
Iterator<Worker> it = workers.iterator();
while (it.hasNext()) {
Worker worker = it.next();
if (System.currentTimeMillis() < worker.expiry) {
break;
}
it.remove();
}
}
};
// The main task is an LRU queue with heartbeating on workers so we can
// detect crashed or blocked worker tasks:
publicstaticvoidmain(String[] args)
{
try (ZContext ctx = new ZContext()) {
Socket frontend = ctx.createSocket(SocketType.ROUTER);
Socket backend = ctx.createSocket(SocketType.ROUTER);
frontend.bind("tcp://*:5555"); // For clients
backend.bind("tcp://*:5556"); // For workers
// List of available workers
ArrayList<Worker> workers = new ArrayList<Worker>();
// Send out heartbeats at regular intervals
long heartbeat_at = System.currentTimeMillis() + HEARTBEAT_INTERVAL;
Poller poller = ctx.createPoller(2);
poller.register(backend, Poller.POLLIN);
poller.register(frontend, Poller.POLLIN);
while (true) {
boolean workersAvailable = workers.size() > 0;
int rc = poller.poll(HEARTBEAT_INTERVAL);
if (rc == -1)
break; // Interrupted
// Handle worker activity on backend
if (poller.pollin(0)) {
// Use worker address for LRU routing
ZMsg msg = ZMsg.recvMsg(backend);
if (msg == null)
break; // Interrupted
// Any sign of life from worker means it's ready
ZFrame address = msg.unwrap();
Worker worker = new Worker(address);
worker.ready(workers);
// Validate control message, or return reply to client
if (msg.size() == 1) {
ZFrame frame = msg.getFirst();
String data = new String(frame.getData(), ZMQ.CHARSET);
if (!data.equals(PPP_READY) &&
!data.equals(PPP_HEARTBEAT)) {
System.out.println(
"E: invalid message from worker"
);
msg.dump(System.out);
}
msg.destroy();
}
else msg.send(frontend);
}
if (workersAvailable && poller.pollin(1)) {
// Now get next client request, route to next worker
ZMsg msg = ZMsg.recvMsg(frontend);
if (msg == null)
break; // Interrupted
msg.push(Worker.next(workers));
msg.send(backend);
}
// We handle heartbeating after any socket activity. First we
// send heartbeats to any idle workers if it's time. Then we
// purge any dead workers:
if (System.currentTimeMillis() >= heartbeat_at) {
for (Worker worker : workers) {
worker.address.send(
backend, ZFrame.REUSE + ZFrame.MORE
);
ZFrame frame = new ZFrame(PPP_HEARTBEAT);
frame.send(backend, 0);
}
long now = System.currentTimeMillis();
heartbeat_at = now + HEARTBEAT_INTERVAL;
}
Worker.purge(workers);
}
// When we're done, clean up properly
workers.clear();
}
}
}
---- Paranoid Pirate queue---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"zmq"
require"zmq.poller"
require"zmsg"local MAX_WORKERS = 100local HEARTBEAT_LIVENESS = 3-- 3-5 is reasonablelocal HEARTBEAT_INTERVAL = 1000-- msecslocal tremove = table.remove
-- Insert worker at end of queue, reset expiry-- Worker must not already be in queuelocalfunctions_worker_append(queue, identity)
if queue[identity] then
printf ("E: duplicate worker identity %s", identity)
else
assert (#queue < MAX_WORKERS)
queue[identity] = s_clock() + HEARTBEAT_INTERVAL * HEARTBEAT_LIVENESS
queue[#queue + 1] = identity
endend-- Remove worker from queue, if presentlocalfunctions_worker_delete(queue, identity)
for i=1,#queue doif queue[i] == identity then
tremove(queue, i)
breakendend
queue[identity] = nilend-- Reset worker expiry, worker must be presentlocalfunctions_worker_refresh(queue, identity)
if queue[identity] then
queue[identity] = s_clock() + HEARTBEAT_INTERVAL * HEARTBEAT_LIVENESS
else
printf("E: worker %s not ready\n", identity)
endend-- Pop next available worker off queue, return identitylocalfunctions_worker_dequeue(queue)
assert (#queue > 0)
local identity = tremove(queue, 1)
queue[identity] = nilreturn identity
end-- Look for & kill expired workerslocalfunctions_queue_purge(queue)
local curr_clock = s_clock()
-- Work backwards from end to simplify removalfor i=#queue,1,-1dolocal id = queue[i]
if (curr_clock > queue[id]) then
tremove(queue, i)
queue[id] = nilendendend
s_version_assert (2, 1)
-- Prepare our context and socketslocal context = zmq.init(1)
local frontend = context:socket(zmq.ROUTER)
local backend = context:socket(zmq.ROUTER)
frontend:bind("tcp://*:5555"); -- For clients
backend:bind("tcp://*:5556"); -- For workers-- Queue of available workerslocal queue = {}
local is_accepting = false-- Send out heartbeats at regular intervalslocal heartbeat_at = s_clock() + HEARTBEAT_INTERVAL
local poller = zmq.poller(2)
localfunctionfrontend_cb()
-- Now get next client request, route to next workerlocal msg = zmsg.recv(frontend)
local identity = s_worker_dequeue (queue)
msg:push(identity)
msg:send(backend)
if (#queue == 0) then-- stop accepting work from clients, when no workers are available.
poller:remove(frontend)
is_accepting = falseendend-- Handle worker activity on backend
poller:add(backend, zmq.POLLIN, function()
local msg = zmsg.recv(backend)
local identity = msg:unwrap()
-- Return reply to client if it's not a control messageif (msg:parts() == 1) thenif (msg:address() == "READY") then
s_worker_delete(queue, identity)
s_worker_append(queue, identity)
elseif (msg:address() == "HEARTBEAT") then
s_worker_refresh(queue, identity)
else
printf("E: invalid message from %s\n", identity)
msg:dump()
endelse-- reply for client.
msg:send(frontend)
s_worker_append(queue, identity)
end-- start accepting client requests, if we are not already doing so.ifnot is_accepting and #queue > 0then
is_accepting = true
poller:add(frontend, zmq.POLLIN, frontend_cb)
endend)
-- start poller's event loopwhiletruedolocal cnt = assert(poller:poll(HEARTBEAT_INTERVAL * 1000))
-- Send heartbeats to idle workers if it's timeif (s_clock() > heartbeat_at) thenfor i=1,#queue dolocal msg = zmsg.new("HEARTBEAT")
msg:wrap(queue[i], nil)
msg:send(backend)
end
heartbeat_at = s_clock() + HEARTBEAT_INTERVAL
end
s_queue_purge(queue)
end-- We never exit the main loop-- But pretend to do the right shutdown anyhowwhile (#queue > 0) do
s_worker_dequeue(queue)
end
frontend:close()
backend:close()
## Paranoid Pirate queue
#
package require zmq
set HEARTBEAT_LIVENESS 3;# 3-5 is reasonable
set HEARTBEAT_INTERVAL 1;# secs
# Paranoid Pirate Protocol constants
set PPP_READY "READY";# Signals worker is ready
set PPP_HEARTBEAT "HEARTBEAT";# Signals worker heartbeat
# This defines one active worker in our worker list
# dict with keys address, identity and expiry
# Construct new worker
proc s_worker_new {address}{global HEARTBEAT_LIVENESS HEARTBEAT_INTERVAL
return[dict create address $address identity $address expiry [expr{[clock seconds] + $HEARTBEAT_INTERVAL * $HEARTBEAT_LIVENESS}]]}# Worker is ready, remove if on list and move to end
proc s_worker_ready {self workersnm}{upvar$workersnm workers
set nworkers {}foreach worker $workers{if{[dict get $self identity]ne[dict get $worker identity]}{lappend nworkers $worker}}lappend nworkers $selfset workers $nworkers}# Return next available worker address
proc s_workers_next {workersnm}{upvar$workersnm workers
set workers [lassign$workers worker]return[dict get $worker address]}# Look for & kill expired workers. Workers are oldest to most recent,
# so we stop at the first alive worker.
proc s_workers_purge {workersnm}{upvar$workersnm workers
set nworkers {}foreach worker $workers{if{[clock seconds] < [dict get $worker expiry]}{# Worker is alive
lappend nworkers $worker}}set workers $nworkers}set ctx [zmq context context]zmq socket frontend $ctx ROUTER
zmq socket backend $ctx ROUTER
frontend bind "tcp://*:5555";# For clients
backend bind "tcp://*:5556";# For workers
# List of available workers
set workers {}# Send out heartbeats at regular intervals
set heartbeat_at [expr{[clock seconds] + $HEARTBEAT_INTERVAL}]while{1}{if{[llength$workers]}{set poll_set [list[list backend [list POLLIN]][list frontend [list POLLIN]]]}else{set poll_set [list[list backend [list POLLIN]]]}set rpoll_set [zmq poll $poll_set$HEARTBEAT_INTERVAL]foreach rpoll $rpoll_set{switch[lindex$rpoll0]{backend{# Handle worker activity on backend
# Use worker address for LRU routing
set msg [zmsg recv backend]# Any sign of life from worker means it's ready
set address [zmsg unwrap msg]set worker [s_worker_new$address]s_worker_ready$worker workers
# Validate control message, or return reply to client
if{[llength$msg] == 1}{if{[lindex$msg0]ne$PPP_READY && [lindex$msg0]ne$PPP_HEARTBEAT}{puts"E: invalid message from worker"zmsg dump $msg}}else{zmsg send frontend $msg}}frontend{# Now get next client request, route to next worker
set msg [zmsg recv frontend]set msg [zmsg push $msg[s_workers_next workers]]zmsg send backend $msg}}}# Send heartbeats to idle workers if it's time
if{[clock seconds] >= $heartbeat_at}{puts"I: heartbeat ([llength $workers])"foreach worker $workers{backend sendmore [dict get $worker address]backend send $PPP_HEARTBEAT}set heartbeat_at [expr{[clock seconds] + $HEARTBEAT_INTERVAL}]}s_workers_purge workers
}frontend close
backend close
$ctxterm
The queue extends the load balancing pattern with heartbeating of workers. Heartbeating is one of those “simple” things that can be difficult to get right. I’ll explain more about that in a second.
// Paranoid Pirate worker
#include"czmq.h"#define HEARTBEAT_LIVENESS 3 // 3-5 is reasonable
#define HEARTBEAT_INTERVAL 1000 // msecs
#define INTERVAL_INIT 1000 // Initial reconnect
#define INTERVAL_MAX 32000 // After exponential backoff
// Paranoid Pirate Protocol constants
#define PPP_READY "\001" // Signals worker is ready
#define PPP_HEARTBEAT "\002" // Signals worker heartbeat
// Helper function that returns a new configured socket
// connected to the Paranoid Pirate queue
staticvoid *
s_worker_socket (zctx_t *ctx) {
void *worker = zsocket_new (ctx, ZMQ_DEALER);
zsocket_connect (worker, "tcp://localhost:5556");
// Tell queue we're ready for work
printf ("I: worker ready\n");
zframe_t *frame = zframe_new (PPP_READY, 1);
zframe_send (&frame, worker, 0);
return worker;
}
// .split main task
// We have a single task that implements the worker side of the
// Paranoid Pirate Protocol (PPP). The interesting parts here are
// the heartbeating, which lets the worker detect if the queue has
// died, and vice versa:
intmain (void)
{
zctx_t *ctx = zctx_new ();
void *worker = s_worker_socket (ctx);
// If liveness hits zero, queue is considered disconnected
size_t liveness = HEARTBEAT_LIVENESS;
size_t interval = INTERVAL_INIT;
// Send out heartbeats at regular intervals
uint64_t heartbeat_at = zclock_time () + HEARTBEAT_INTERVAL;
srandom ((unsigned) time (NULL));
int cycles = 0;
while (true) {
zmq_pollitem_t items [] = { { worker, 0, ZMQ_POLLIN, 0 } };
int rc = zmq_poll (items, 1, HEARTBEAT_INTERVAL * ZMQ_POLL_MSEC);
if (rc == -1)
break; // Interrupted
if (items [0].revents & ZMQ_POLLIN) {
// Get message
// - 3-part envelope + content -> request
// - 1-part HEARTBEAT -> heartbeat
zmsg_t *msg = zmsg_recv (worker);
if (!msg)
break; // Interrupted
// .split simulating problems
// To test the robustness of the queue implementation we
// simulate various typical problems, such as the worker
// crashing or running very slowly. We do this after a few
// cycles so that the architecture can get up and running
// first:
if (zmsg_size (msg) == 3) {
cycles++;
if (cycles > 3 && randof (5) == 0) {
printf ("I: simulating a crash\n");
zmsg_destroy (&msg);
break;
}
elseif (cycles > 3 && randof (5) == 0) {
printf ("I: simulating CPU overload\n");
sleep (3);
if (zctx_interrupted)
break;
}
printf ("I: normal reply\n");
zmsg_send (&msg, worker);
liveness = HEARTBEAT_LIVENESS;
sleep (1); // Do some heavy work
if (zctx_interrupted)
break;
}
else// .split handle heartbeats
// When we get a heartbeat message from the queue, it means the
// queue was (recently) alive, so we must reset our liveness
// indicator:
if (zmsg_size (msg) == 1) {
zframe_t *frame = zmsg_first (msg);
if (memcmp (zframe_data (frame), PPP_HEARTBEAT, 1) == 0)
liveness = HEARTBEAT_LIVENESS;
else {
printf ("E: invalid message\n");
zmsg_dump (msg);
}
zmsg_destroy (&msg);
}
else {
printf ("E: invalid message\n");
zmsg_dump (msg);
}
interval = INTERVAL_INIT;
}
else// .split detecting a dead queue
// If the queue hasn't sent us heartbeats in a while, destroy the
// socket and reconnect. This is the simplest most brutal way of
// discarding any messages we might have sent in the meantime:
if (--liveness == 0) {
printf ("W: heartbeat failure, can't reach queue\n");
printf ("W: reconnecting in %zd msec...\n", interval);
zclock_sleep (interval);
if (interval < INTERVAL_MAX)
interval *= 2;
zsocket_destroy (ctx, worker);
worker = s_worker_socket (ctx);
liveness = HEARTBEAT_LIVENESS;
}
// Send heartbeat to queue if it's time
if (zclock_time () > heartbeat_at) {
heartbeat_at = zclock_time () + HEARTBEAT_INTERVAL;
printf ("I: worker heartbeat\n");
zframe_t *frame = zframe_new (PPP_HEARTBEAT, 1);
zframe_send (&frame, worker, 0);
}
}
zctx_destroy (&ctx);
return0;
}
ppworker: Paranoid Pirate worker in C++
//
// Paranoid Pirate worker
//
//
// Andreas Hoelzlwimmer <andreas.hoelzlwimmer@fh-hagenberg.at>
//
#include"zmsg.hpp"#include<iomanip>#define HEARTBEAT_LIVENESS 3 // 3-5 is reasonable
#define HEARTBEAT_INTERVAL 1000 // msecs
#define INTERVAL_INIT 1000 // Initial reconnect
#define INTERVAL_MAX 32000 // After exponential backoff
// Helper function that returns a new configured socket
// connected to the Hello World server
//
std::string identity;
static zmq::socket_t *
s_worker_socket (zmq::context_t &context) {
zmq::socket_t * worker = new zmq::socket_t(context, ZMQ_DEALER);
// Set random identity to make tracing easier
identity = s_set_id(*worker);
worker->connect ("tcp://localhost:5556");
// Configure socket to not wait at close time
int linger = 0;
worker->setsockopt (ZMQ_LINGER, &linger, sizeof (linger));
// Tell queue we're ready for work
std::cout << "I: (" << identity << ") worker ready" << std::endl;
s_send (*worker, std::string("READY"));
return worker;
}
intmain (void)
{
s_version_assert (4, 0);
srandom ((unsigned) time (NULL));
zmq::context_t context (1);
zmq::socket_t * worker = s_worker_socket (context);
// If liveness hits zero, queue is considered disconnected
size_t liveness = HEARTBEAT_LIVENESS;
size_t interval = INTERVAL_INIT;
// Send out heartbeats at regular intervals
int64_t heartbeat_at = s_clock () + HEARTBEAT_INTERVAL;
int cycles = 0;
while (1) {
zmq::pollitem_t items[] = {
{static_cast<void*>(*worker), 0, ZMQ_POLLIN, 0 } };
zmq::poll (items, 1, HEARTBEAT_INTERVAL);
if (items [0].revents & ZMQ_POLLIN) {
// Get message
// - 3-part envelope + content -> request
// - 1-part "HEARTBEAT" -> heartbeat
zmsg msg (*worker);
if (msg.parts () == 3) {
// Simulate various problems, after a few cycles
cycles++;
if (cycles > 3 && within (5) == 0) {
std::cout << "I: (" << identity << ") simulating a crash" << std::endl;
msg.clear ();
break;
}
else {
if (cycles > 3 && within (5) == 0) {
std::cout << "I: (" << identity << ") simulating CPU overload" << std::endl;
sleep (5);
}
}
std::cout << "I: (" << identity << ") normal reply - " << msg.body() << std::endl;
msg.send (*worker);
liveness = HEARTBEAT_LIVENESS;
sleep (1); // Do some heavy work
}
else {
if (msg.parts () == 1
&& strcmp (msg.body (), "HEARTBEAT") == 0) {
liveness = HEARTBEAT_LIVENESS;
}
else {
std::cout << "E: (" << identity << ") invalid message" << std::endl;
msg.dump ();
}
}
interval = INTERVAL_INIT;
}
elseif (--liveness == 0) {
std::cout << "W: (" << identity << ") heartbeat failure, can't reach queue" << std::endl;
std::cout << "W: (" << identity << ") reconnecting in " << interval << " msec..." << std::endl;
s_sleep (interval);
if (interval < INTERVAL_MAX) {
interval *= 2;
}
delete worker;
worker = s_worker_socket (context);
liveness = HEARTBEAT_LIVENESS;
}
// Send heartbeat to queue if it's time
if (s_clock () > heartbeat_at) {
heartbeat_at = s_clock () + HEARTBEAT_INTERVAL;
std::cout << "I: (" << identity << ") worker heartbeat" << std::endl;
s_send (*worker, std::string("HEARTBEAT"));
}
}
delete worker;
return0;
}
package ;
importhaxe.Stack;
importneko.Lib;
importneko.Sys;
importorg.zeromq.ZContext;
importorg.zeromq.ZFrame;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQPoller;
importorg.zeromq.ZMQSocket;
importorg.zeromq.ZMQException;
importorg.zeromq.ZMsg;
importorg.zeromq.ZSocket;
/**
* Paranoid Pirate worker
*
* @see http://zguide.zeromq.org/page:all#Robust-Reliable-Queuing-Paranoid-Pirate-Pattern
*
* Author: rsmith (at) rsbatechnology (dot) co (dot) uk
*/class PPWorker
{
privatestaticinlinevar HEARTBEAT_LIVENESS = 3;
privatestaticinlinevar HEARTBEAT_INTERVAL = 1000; // msecsprivatestaticinlinevar INTERVAL_INIT = 1000; // Initial reconnectprivatestaticinlinevar INTERVAL_MAX = 32000; // After exponential backoffprivatestaticinlinevar PPP_READY = String.fromCharCode(1);
privatestaticinlinevar PPP_HEARTBEAT = String.fromCharCode(2);
/**
* Helper function that returns a new configured socket
* connected to the Paranid Pirate queue
* @param ctx
* @return
*/privatestaticfunctionworkerSocket(ctx:ZContext):ZMQSocket {
var worker = ctx.createSocket(ZMQ_DEALER);
worker.connect("tcp://localhost:5556");
// Tell queue we're ready for work
Lib.println("I: worker ready");
ZFrame.newStringFrame(PPP_READY).send(worker);
return worker;
}
publicstaticfunctionmain() {
Lib.println("** PPWorker (see: http://zguide.zeromq.org/page:all#Robust-Reliable-Queuing-Paranoid-Pirate-Pattern)");
var ctx = new ZContext();
var worker = workerSocket(ctx);
// If liveness hits zero, queue is considered disconnectedvar liveness = HEARTBEAT_LIVENESS;
var interval = INTERVAL_INIT;
// Send out heartbeats at regular intervalsvar heartbeatAt = Date.now().getTime() + HEARTBEAT_INTERVAL;
var cycles = 0;
var poller = new ZMQPoller();
poller.registerSocket(worker, ZMQ.ZMQ_POLLIN());
while (true) {
try {
poller.poll(HEARTBEAT_INTERVAL * 1000);
} catch (e:ZMQException) {
if (ZMQ.isInterrupted())
break;
trace("ZMQException #:" + e.errNo + ", str:" + e.str());
trace (Stack.toString(Stack.exceptionStack()));
}
if (poller.pollin(1)) {
// Get message// - 3-part envelope + content -> request// - 1-part HEARTBEAT -> heartbeatvar msg = ZMsg.recvMsg(worker);
if (msg == null)
break; // Interruptedif (msg.size() == 3) {
// Simulate various problems, after a few cycles
cycles++;
if (cycles > 3 && ZHelpers.randof(5) == 0) {
Lib.println("I: simulating a crash");
msg.destroy();
break;
} elseif (cycles > 3 && ZHelpers.randof(5) == 0) {
Lib.println("I: simulating CPU overload");
Sys.sleep(3.0);
if (ZMQ.isInterrupted())
break;
}
Lib.println("I: normal reply");
msg.send(worker);
liveness = HEARTBEAT_LIVENESS;
Sys.sleep(1.0); // Do some heavy workif (ZMQ.isInterrupted())
break;
} elseif (msg.size() == 1) {
var frame = msg.first();
if (frame.streq(PPP_HEARTBEAT))
liveness = HEARTBEAT_LIVENESS;
else {
Lib.println("E: invalid message");
Lib.println(msg.toString());
}
msg.destroy();
} else {
Lib.println("E: invalid message");
Lib.println(msg.toString());
}
interval = INTERVAL_INIT;
} elseif (--liveness == 0) {
Lib.println("W: heartbeat failure, can't reach queue");
Lib.println("W: reconnecting in " + interval + " msec...");
Sys.sleep(interval / 1000.0);
if (interval < INTERVAL_MAX)
interval *= 2;
ctx.destroySocket(worker);
worker = workerSocket(ctx);
poller.unregisterAllSockets();
poller.registerSocket(worker, ZMQ.ZMQ_POLLIN());
liveness = HEARTBEAT_LIVENESS;
}
// Send heartbeat to queue if it's timeif (Date.now().getTime() > heartbeatAt) {
heartbeatAt = Date.now().getTime() + HEARTBEAT_INTERVAL;
Lib.println("I: worker heartbeat");
ZFrame.newStringFrame(PPP_HEARTBEAT).send(worker);
}
}
ctx.destroy();
}
}
ppworker: Paranoid Pirate worker in Java
packageguide;
importjava.util.Random;
importorg.zeromq.*;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
//
// Paranoid Pirate worker
//
publicclassppworker
{
privatefinalstaticint HEARTBEAT_LIVENESS = 3; // 3-5 is reasonable
privatefinalstaticint HEARTBEAT_INTERVAL = 1000; // msecs
privatefinalstaticint INTERVAL_INIT = 1000; // Initial reconnect
privatefinalstaticint INTERVAL_MAX = 32000; // After exponential backoff
// Paranoid Pirate Protocol constants
privatefinalstatic String PPP_READY = "\001"; // Signals worker is ready
privatefinalstatic String PPP_HEARTBEAT = "\002"; // Signals worker heartbeat
// Helper function that returns a new configured socket
// connected to the Paranoid Pirate queue
privatestatic Socket worker_socket(ZContext ctx)
{
Socket worker = ctx.createSocket(SocketType.DEALER);
worker.connect("tcp://localhost:5556");
// Tell queue we're ready for work
System.out.println("I: worker ready\n");
ZFrame frame = new ZFrame(PPP_READY);
frame.send(worker, 0);
return worker;
}
// We have a single task, which implements the worker side of the
// Paranoid Pirate Protocol (PPP). The interesting parts here are
// the heartbeating, which lets the worker detect if the queue has
// died, and vice-versa:
publicstaticvoidmain(String[] args)
{
try (ZContext ctx = new ZContext()) {
Socket worker = worker_socket(ctx);
Poller poller = ctx.createPoller(1);
poller.register(worker, Poller.POLLIN);
// If liveness hits zero, queue is considered disconnected
int liveness = HEARTBEAT_LIVENESS;
int interval = INTERVAL_INIT;
// Send out heartbeats at regular intervals
long heartbeat_at = System.currentTimeMillis() + HEARTBEAT_INTERVAL;
Random rand = new Random(System.nanoTime());
int cycles = 0;
while (true) {
int rc = poller.poll(HEARTBEAT_INTERVAL);
if (rc == -1)
break; // Interrupted
if (poller.pollin(0)) {
// Get message
// - 3-part envelope + content -> request
// - 1-part HEARTBEAT -> heartbeat
ZMsg msg = ZMsg.recvMsg(worker);
if (msg == null)
break; // Interrupted
// To test the robustness of the queue implementation we
// simulate various typical problems, such as the worker
// crashing, or running very slowly. We do this after a few
// cycles so that the architecture can get up and running
// first:
if (msg.size() == 3) {
cycles++;
if (cycles > 3 && rand.nextInt(5) == 0) {
System.out.println("I: simulating a crash\n");
msg.destroy();
msg = null;
break;
}
elseif (cycles > 3 && rand.nextInt(5) == 0) {
System.out.println("I: simulating CPU overload\n");
try {
Thread.sleep(3000);
}
catch (InterruptedException e) {
break;
}
}
System.out.println("I: normal reply\n");
msg.send(worker);
liveness = HEARTBEAT_LIVENESS;
try {
Thread.sleep(1000);
}
catch (InterruptedException e) {
break;
} // Do some heavy work
}
else// When we get a heartbeat message from the queue, it
// means the queue was (recently) alive, so reset our
// liveness indicator:
if (msg.size() == 1) {
ZFrame frame = msg.getFirst();
String frameData = new String(
frame.getData(), ZMQ.CHARSET
);
if (PPP_HEARTBEAT.equals(frameData))
liveness = HEARTBEAT_LIVENESS;
else {
System.out.println("E: invalid message\n");
msg.dump(System.out);
}
msg.destroy();
}
else {
System.out.println("E: invalid message\n");
msg.dump(System.out);
}
interval = INTERVAL_INIT;
}
else// If the queue hasn't sent us heartbeats in a while,
// destroy the socket and reconnect. This is the simplest
// most brutal way of discarding any messages we might have
// sent in the meantime.
if (--liveness == 0) {
System.out.println(
"W: heartbeat failure, can't reach queue\n"
);
System.out.printf(
"W: reconnecting in %sd msec\n", interval
);
try {
Thread.sleep(interval);
}
catch (InterruptedException e) {
e.printStackTrace();
}
if (interval < INTERVAL_MAX)
interval *= 2;
ctx.destroySocket(worker);
worker = worker_socket(ctx);
liveness = HEARTBEAT_LIVENESS;
}
// Send heartbeat to queue if it's time
if (System.currentTimeMillis() > heartbeat_at) {
long now = System.currentTimeMillis();
heartbeat_at = now + HEARTBEAT_INTERVAL;
System.out.println("I: worker heartbeat\n");
ZFrame frame = new ZFrame(PPP_HEARTBEAT);
frame.send(worker, 0);
}
}
}
}
}
The code includes simulation of failures, as before. This makes it (a) very hard to debug, and (b) dangerous to reuse. When you want to debug this, disable the failure simulation.
The worker uses a reconnect strategy similar to the one we designed for the Lazy Pirate client, with two major differences: (a) it does an exponential back-off, and (b) it retries indefinitely (whereas the client retries a few times before reporting a failure).
Try the client, queue, and workers, such as by using a script like this:
ppqueue &
for i in 1 2 3 4; do
ppworker &
sleep 1
done
lpclient &
You should see the workers die one-by-one as they simulate a crash, and the client eventually give up. You can stop and restart the queue and both client and workers will reconnect and carry on. And no matter what you do to queues and workers, the client will never get an out-of-order reply: the whole chain either works, or the client abandons.
Heartbeating solves the problem of knowing whether a peer is alive or dead. This is not an issue specific to ZeroMQ. TCP has a long timeout (30 minutes or so), that means that it can be impossible to know whether a peer has died, been disconnected, or gone on a weekend to Prague with a case of vodka, a redhead, and a large expense account.
It’s not easy to get heartbeating right. When writing the Paranoid Pirate examples, it took about five hours to get the heartbeating working properly. The rest of the request-reply chain took perhaps ten minutes. It is especially easy to create “false failures”, i.e., when peers decide that they are disconnected because the heartbeats aren’t sent properly.
We’ll look at the three main answers people use for heartbeating with ZeroMQ.
The most common approach is to do no heartbeating at all and hope for the best. Many if not most ZeroMQ applications do this. ZeroMQ encourages this by hiding peers in many cases. What problems does this approach cause?
When we use a ROUTER socket in an application that tracks peers, as peers disconnect and reconnect, the application will leak memory (resources that the application holds for each peer) and get slower and slower.
When we use SUB- or DEALER-based data recipients, we can’t tell the difference between good silence (there’s no data) and bad silence (the other end died). When a recipient knows the other side died, it can for example switch over to a backup route.
If we use a TCP connection that stays silent for a long while, it will, in some networks, just die. Sending something (technically, a “keep-alive” more than a heartbeat), will keep the network alive.
A second option is to send a heartbeat message from each node to its peers every second or so. When one node hears nothing from another within some timeout (several seconds, typically), it will treat that peer as dead. Sounds good, right? Sadly, no. This works in some cases but has nasty edge cases in others.
For pub-sub, this does work, and it’s the only model you can use. SUB sockets cannot talk back to PUB sockets, but PUB sockets can happily send “I’m alive” messages to their subscribers.
As an optimization, you can send heartbeats only when there is no real data to send. Furthermore, you can send heartbeats progressively slower and slower, if network activity is an issue (e.g., on mobile networks where activity drains the battery). As long as the recipient can detect a failure (sharp stop in activity), that’s fine.
Here are the typical problems with this design:
It can be inaccurate when we send large amounts of data, as heartbeats will be delayed behind that data. If heartbeats are delayed, you can get false timeouts and disconnections due to network congestion. Thus, always treat any incoming data as a heartbeat, whether or not the sender optimizes out heartbeats.
While the pub-sub pattern will drop messages for disappeared recipients, PUSH and DEALER sockets will queue them. So if you send heartbeats to a dead peer and it comes back, it will get all the heartbeats you sent, which can be thousands. Whoa, whoa!
This design assumes that heartbeat timeouts are the same across the whole network. But that won’t be accurate. Some peers will want very aggressive heartbeating in order to detect faults rapidly. And some will want very relaxed heartbeating, in order to let sleeping networks lie and save power.
The third option is to use a ping-pong dialog. One peer sends a ping command to the other, which replies with a pong command. Neither command has any payload. Pings and pongs are not correlated. Because the roles of “client” and “server” are arbitrary in some networks, we usually specify that either peer can in fact send a ping and expect a pong in response. However, because the timeouts depend on network topologies known best to dynamic clients, it is usually the client that pings the server.
This works for all ROUTER-based brokers. The same optimizations we used in the second model make this work even better: treat any incoming data as a pong, and only send a ping when not otherwise sending data.
For Paranoid Pirate, we chose the second approach. It might not have been the simplest option: if designing this today, I’d probably try a ping-pong approach instead. However the principles are similar. The heartbeat messages flow asynchronously in both directions, and either peer can decide the other is “dead” and stop talking to it.
In the worker, this is how we handle heartbeats from the queue:
We calculate a liveness, which is how many heartbeats we can still miss before deciding the queue is dead. It starts at three and we decrement it each time we miss a heartbeat.
We wait, in the zmq_poll loop, for one second each time, which is our heartbeat interval.
If there’s any message from the queue during that time, we reset our liveness to three.
If there’s no message during that time, we count down our liveness.
If the liveness reaches zero, we consider the queue dead.
If the queue is dead, we destroy our socket, create a new one, and reconnect.
To avoid opening and closing too many sockets, we wait for a certain interval before reconnecting, and we double the interval each time until it reaches 32 seconds.
And this is how we handle heartbeats to the queue:
We calculate when to send the next heartbeat; this is a single variable because we’re talking to one peer, the queue.
In the zmq_poll loop, whenever we pass this time, we send a heartbeat to the queue.
Here’s the essential heartbeating code for the worker:
#define HEARTBEAT_LIVENESS 3 // 3-5 is reasonable
#define HEARTBEAT_INTERVAL 1000 // msecs
#define INTERVAL_INIT 1000 // Initial reconnect
#define INTERVAL_MAX 32000 // After exponential backoff
...
// If liveness hits zero, queue is considered disconnected
size_t liveness = HEARTBEAT_LIVENESS;
size_t interval = INTERVAL_INIT;
// Send out heartbeats at regular intervals
uint64_t heartbeat_at = zclock_time () + HEARTBEAT_INTERVAL;
while (true) {
zmq_pollitem_t items [] = { { worker, 0, ZMQ_POLLIN, 0 } };
int rc = zmq_poll (items, 1, HEARTBEAT_INTERVAL * ZMQ_POLL_MSEC);
if (items [0].revents & ZMQ_POLLIN) {
// Receive any message from queue
liveness = HEARTBEAT_LIVENESS;
interval = INTERVAL_INIT;
}
elseif (--liveness == 0) {
zclock_sleep (interval);
if (interval < INTERVAL_MAX)
interval *= 2;
zsocket_destroy (ctx, worker);
...
liveness = HEARTBEAT_LIVENESS;
}
// Send heartbeat to queue if it's time
if (zclock_time () > heartbeat_at) {
heartbeat_at = zclock_time () + HEARTBEAT_INTERVAL;
// Send heartbeat message to queue
}
}
The queue does the same, but manages an expiration time for each worker.
Here are some tips for your own heartbeating implementation:
Use zmq_poll or a reactor as the core of your application’s main task.
Start by building the heartbeating between peers, test it by simulating failures, and then build the rest of the message flow. Adding heartbeating afterwards is much trickier.
Use simple tracing, i.e., print to console, to get this working. To help you trace the flow of messages between peers, use a dump method such as zmsg offers, and number your messages incrementally so you can see if there are gaps.
In a real application, heartbeating must be configurable and usually negotiated with the peer. Some peers will want aggressive heartbeating, as low as 10 msecs. Other peers will be far away and want heartbeating as high as 30 seconds.
If you have different heartbeat intervals for different peers, your poll timeout should be the lowest (shortest time) of these. Do not use an infinite timeout.
Do heartbeating on the same socket you use for messages, so your heartbeats also act as a keep-alive to stop the network connection from going stale (some firewalls can be unkind to silent connections).
If you’re paying attention, you’ll realize that Paranoid Pirate is not interoperable with Simple Pirate, because of the heartbeats. But how do we define “interoperable”? To guarantee interoperability, we need a kind of contract, an agreement that lets different teams in different times and places write code that is guaranteed to work together. We call this a “protocol”.
It’s fun to experiment without specifications, but that’s not a sensible basis for real applications. What happens if we want to write a worker in another language? Do we have to read code to see how things work? What if we want to change the protocol for some reason? Even a simple protocol will, if it’s successful, evolve and become more complex.
Lack of contracts is a sure sign of a disposable application. So let’s write a contract for this protocol. How do we do that?
There’s a wiki at
rfc.zeromq.org that we made especially as a home for public ZeroMQ contracts.
To create a new specification, register on the wiki if needed, and follow the instructions. It’s fairly straightforward, though writing technical texts is not everyone’s cup of tea.
It took me about fifteen minutes to draft the new
Pirate Pattern Protocol. It’s not a big specification, but it does capture enough to act as the basis for arguments (“your queue isn’t PPP compatible; please fix it!").
Turning PPP into a real protocol would take more work:
There should be a protocol version number in the READY command so that it’s possible to distinguish between different versions of PPP.
Right now, READY and HEARTBEAT are not entirely distinct from requests and replies. To make them distinct, we would need a message structure that includes a “message type” part.
The nice thing about progress is how fast it happens when lawyers and committees aren’t involved. The
one-page MDP specification turns PPP into something more solid. This is how we should design complex architectures: start by writing down the contracts, and only then write software to implement them.
The Majordomo Protocol (MDP) extends and improves on PPP in one interesting way: it adds a “service name” to requests that the client sends, and asks workers to register for specific services. Adding service names turns our Paranoid Pirate queue into a service-oriented broker. The nice thing about MDP is that it came out of working code, a simpler ancestor protocol (PPP), and a precise set of improvements that each solved a clear problem. This made it easy to draft.
To implement Majordomo, we need to write a framework for clients and workers. It’s really not sane to ask every application developer to read the spec and make it work, when they could be using a simpler API that does the work for them.
So while our first contract (MDP itself) defines how the pieces of our distributed architecture talk to each other, our second contract defines how user applications talk to the technical framework we’re going to design.
Majordomo has two halves, a client side and a worker side. Because we’ll write both client and worker applications, we will need two APIs. Here is a sketch for the client API, using a simple object-oriented approach:
That’s it. We open a session to the broker, send a request message, get a reply message back, and eventually close the connection. Here’s a sketch for the worker API:
It’s more or less symmetrical, but the worker dialog is a little different. The first time a worker does a recv(), it passes a null reply. Thereafter, it passes the current reply, and gets a new request.
The client and worker APIs were fairly simple to construct because they’re heavily based on the Paranoid Pirate code we already developed. Here is the client API:
// mdcliapi class - Majordomo Protocol Client API
// Implements the MDP/Worker spec at http://rfc.zeromq.org/spec:7.
#include"mdcliapi.h"// Structure of our class
// We access these properties only via class methods
struct _mdcli_t {
zctx_t *ctx; // Our context
char *broker;
void *client; // Socket to broker
int verbose; // Print activity to stdout
int timeout; // Request timeout
int retries; // Request retries
};
// Connect or reconnect to broker
voids_mdcli_connect_to_broker (mdcli_t *self)
{
if (self->client)
zsocket_destroy (self->ctx, self->client);
self->client = zsocket_new (self->ctx, ZMQ_REQ);
zmq_connect (self->client, self->broker);
if (self->verbose)
zclock_log ("I: connecting to broker at %s...", self->broker);
}
// .split constructor and destructor
// Here we have the constructor and destructor for our class:
// Constructor
mdcli_t *
mdcli_new (char *broker, int verbose)
{
assert (broker);
mdcli_t *self = (mdcli_t *) zmalloc (sizeof (mdcli_t));
self->ctx = zctx_new ();
self->broker = strdup (broker);
self->verbose = verbose;
self->timeout = 2500; // msecs
self->retries = 3; // Before we abandon
s_mdcli_connect_to_broker (self);
return self;
}
// Destructor
voidmdcli_destroy (mdcli_t **self_p)
{
assert (self_p);
if (*self_p) {
mdcli_t *self = *self_p;
zctx_destroy (&self->ctx);
free (self->broker);
free (self);
*self_p = NULL;
}
}
// .split configure retry behavior
// These are the class methods. We can set the request timeout and number
// of retry attempts before sending requests:
// Set request timeout
voidmdcli_set_timeout (mdcli_t *self, int timeout)
{
assert (self);
self->timeout = timeout;
}
// Set request retries
voidmdcli_set_retries (mdcli_t *self, int retries)
{
assert (self);
self->retries = retries;
}
// .split send request and wait for reply
// Here is the {{send}} method. It sends a request to the broker and gets
// a reply even if it has to retry several times. It takes ownership of
// the request message, and destroys it when sent. It returns the reply
// message, or NULL if there was no reply after multiple attempts:
zmsg_t *
mdcli_send (mdcli_t *self, char *service, zmsg_t **request_p)
{
assert (self);
assert (request_p);
zmsg_t *request = *request_p;
// Prefix request with protocol frames
// Frame 1: "MDPCxy" (six bytes, MDP/Client x.y)
// Frame 2: Service name (printable string)
zmsg_pushstr (request, service);
zmsg_pushstr (request, MDPC_CLIENT);
if (self->verbose) {
zclock_log ("I: send request to '%s' service:", service);
zmsg_dump (request);
}
int retries_left = self->retries;
while (retries_left && !zctx_interrupted) {
zmsg_t *msg = zmsg_dup (request);
zmsg_send (&msg, self->client);
zmq_pollitem_t items [] = {
{ self->client, 0, ZMQ_POLLIN, 0 }
};
// .split body of send
// On any blocking call, {{libzmq}} will return -1 if there was
// an error; we could in theory check for different error codes,
// but in practice it's OK to assume it was {{EINTR}} (Ctrl-C):
int rc = zmq_poll (items, 1, self->timeout * ZMQ_POLL_MSEC);
if (rc == -1)
break; // Interrupted
// If we got a reply, process it
if (items [0].revents & ZMQ_POLLIN) {
zmsg_t *msg = zmsg_recv (self->client);
if (self->verbose) {
zclock_log ("I: received reply:");
zmsg_dump (msg);
}
// We would handle malformed replies better in real code
assert (zmsg_size (msg) >= 3);
zframe_t *header = zmsg_pop (msg);
assert (zframe_streq (header, MDPC_CLIENT));
zframe_destroy (&header);
zframe_t *reply_service = zmsg_pop (msg);
assert (zframe_streq (reply_service, service));
zframe_destroy (&reply_service);
zmsg_destroy (&request);
return msg; // Success
}
elseif (--retries_left) {
if (self->verbose)
zclock_log ("W: no reply, reconnecting...");
s_mdcli_connect_to_broker (self);
}
else {
if (self->verbose)
zclock_log ("W: permanent error, abandoning");
break; // Give up
}
}
if (zctx_interrupted)
printf ("W: interrupt received, killing client...\n");
zmsg_destroy (&request);
returnNULL;
}
mdcliapi: Majordomo client API in C++
#ifndef __MDCLIAPI_HPP_INCLUDED__
#define __MDCLIAPI_HPP_INCLUDED__
//#include "mdcliapi.h"
#include"zmsg.hpp"#include"mdp.h"classmdcli {
public:
// ---------------------------------------------------------------------
// Constructor
mdcli (std::string broker, int verbose): m_broker(broker), m_verbose(verbose)
{
assert (broker.size()!=0);
s_version_assert (4, 0);
m_context = new zmq::context_t(1);
s_catch_signals ();
connect_to_broker ();
}
// Destructor
virtual
~mdcli ()
{
delete m_client;
delete m_context;
}
// ---------------------------------------------------------------------
// Connect or reconnect to broker
voidconnect_to_broker ()
{
if (m_client) {
delete m_client;
}
m_client = new zmq::socket_t (*m_context, ZMQ_REQ);
s_set_id(*m_client);
int linger = 0;
m_client->setsockopt(ZMQ_LINGER, &linger, sizeof (linger));
m_client->connect (m_broker.c_str());
if (m_verbose) {
s_console ("I: connecting to broker at %s...", m_broker.c_str());
}
}
// ---------------------------------------------------------------------
// Set request timeout
voidset_timeout (int timeout)
{
m_timeout = timeout;
}
// ---------------------------------------------------------------------
// Set request retries
voidset_retries (int retries)
{
m_retries = retries;
}
// ---------------------------------------------------------------------
// Send request to broker and get reply by hook or crook
// Takes ownership of request message and destroys it when sent.
// Returns the reply message or NULL if there was no reply.
zmsg *
send (std::string service, zmsg *&request_p)
{
assert (request_p);
zmsg *request = request_p;
// Prefix request with protocol frames
// Frame 1: "MDPCxy" (six bytes, MDP/Client x.y)
// Frame 2: Service name (printable string)
request->push_front(service.c_str());
request->push_front(k_mdp_client.data());
if (m_verbose) {
s_console ("I: send request to '%s' service:", service.c_str());
request->dump();
}
int retries_left = m_retries;
while (retries_left && !s_interrupted) {
zmsg * msg = new zmsg(*request);
msg->send(*m_client);
while (!s_interrupted) {
// Poll socket for a reply, with timeout
zmq::pollitem_t items [] = {
{ *m_client, 0, ZMQ_POLLIN, 0 } };
zmq::poll (items, 1, m_timeout);
// If we got a reply, process it
if (items [0].revents & ZMQ_POLLIN) {
zmsg * recv_msg = new zmsg(*m_client);
if (m_verbose) {
s_console ("I: received reply:");
recv_msg->dump ();
}
// Don't try to handle errors, just assert noisily
assert (recv_msg->parts () >= 3);
ustring header = recv_msg->pop_front();
assert (header.compare((unsignedchar *)k_mdp_client.data()) == 0);
ustring reply_service = recv_msg->pop_front();
assert (reply_service.compare((unsignedchar *)service.c_str()) == 0);
delete request;
return recv_msg; // Success
}
else {
if (--retries_left) {
if (m_verbose) {
s_console ("W: no reply, reconnecting...");
}
// Reconnect, and resend message
connect_to_broker ();
zmsg msg (*request);
msg.send (*m_client);
}
else {
if (m_verbose) {
s_console ("W: permanent error, abandoning request");
}
break; // Give up
}
}
}
}
if (s_interrupted) {
std::cout << "W: interrupt received, killing client..." << std::endl;
}
delete request;
return0;
}
private:
const std::string m_broker;
zmq::context_t * m_context;
zmq::socket_t * m_client{nullptr}; // Socket to broker
constint m_verbose; // Print activity to stdout
int m_timeout{2500}; // Request timeout
int m_retries{3}; // Request retries
};
#endif
// mdwrkapi class - Majordomo Protocol Worker API
// Implements the MDP/Worker spec at http://rfc.zeromq.org/spec:7.
#include"mdwrkapi.h"// Reliability parameters
#define HEARTBEAT_LIVENESS 3 // 3-5 is reasonable
// .split worker class structure
// This is the structure of a worker API instance. We use a pseudo-OO
// approach in a lot of the C examples, as well as the CZMQ binding:
// Structure of our class
// We access these properties only via class methods
struct _mdwrk_t {
zctx_t *ctx; // Our context
char *broker;
char *service;
void *worker; // Socket to broker
int verbose; // Print activity to stdout
// Heartbeat management
uint64_t heartbeat_at; // When to send HEARTBEAT
size_t liveness; // How many attempts left
int heartbeat; // Heartbeat delay, msecs
int reconnect; // Reconnect delay, msecs
int expect_reply; // Zero only at start
zframe_t *reply_to; // Return identity, if any
};
// .split utility functions
// We have two utility functions; to send a message to the broker and
// to (re)connect to the broker:
// Send message to broker
// If no msg is provided, creates one internally
staticvoids_mdwrk_send_to_broker (mdwrk_t *self, char *command, char *option,
zmsg_t *msg)
{
msg = msg? zmsg_dup (msg): zmsg_new ();
// Stack protocol envelope to start of message
if (option)
zmsg_pushstr (msg, option);
zmsg_pushstr (msg, command);
zmsg_pushstr (msg, MDPW_WORKER);
zmsg_pushstr (msg, "");
if (self->verbose) {
zclock_log ("I: sending %s to broker",
mdps_commands [(int) *command]);
zmsg_dump (msg);
}
zmsg_send (&msg, self->worker);
}
// Connect or reconnect to broker
voids_mdwrk_connect_to_broker (mdwrk_t *self)
{
if (self->worker)
zsocket_destroy (self->ctx, self->worker);
self->worker = zsocket_new (self->ctx, ZMQ_DEALER);
zmq_connect (self->worker, self->broker);
if (self->verbose)
zclock_log ("I: connecting to broker at %s...", self->broker);
// Register service with broker
s_mdwrk_send_to_broker (self, MDPW_READY, self->service, NULL);
// If liveness hits zero, queue is considered disconnected
self->liveness = HEARTBEAT_LIVENESS;
self->heartbeat_at = zclock_time () + self->heartbeat;
}
// .split constructor and destructor
// Here we have the constructor and destructor for our mdwrk class:
// Constructor
mdwrk_t *
mdwrk_new (char *broker,char *service, int verbose)
{
assert (broker);
assert (service);
mdwrk_t *self = (mdwrk_t *) zmalloc (sizeof (mdwrk_t));
self->ctx = zctx_new ();
self->broker = strdup (broker);
self->service = strdup (service);
self->verbose = verbose;
self->heartbeat = 2500; // msecs
self->reconnect = 2500; // msecs
s_mdwrk_connect_to_broker (self);
return self;
}
// Destructor
voidmdwrk_destroy (mdwrk_t **self_p)
{
assert (self_p);
if (*self_p) {
mdwrk_t *self = *self_p;
zctx_destroy (&self->ctx);
free (self->broker);
free (self->service);
free (self);
*self_p = NULL;
}
}
// .split configure worker
// We provide two methods to configure the worker API. You can set the
// heartbeat interval and retries to match the expected network performance.
// Set heartbeat delay
voidmdwrk_set_heartbeat (mdwrk_t *self, int heartbeat)
{
self->heartbeat = heartbeat;
}
// Set reconnect delay
voidmdwrk_set_reconnect (mdwrk_t *self, int reconnect)
{
self->reconnect = reconnect;
}
// .split recv method
// This is the {{recv}} method; it's a little misnamed because it first sends
// any reply and then waits for a new request. If you have a better name
// for this, let me know.
// Send reply, if any, to broker and wait for next request.
zmsg_t *
mdwrk_recv (mdwrk_t *self, zmsg_t **reply_p)
{
// Format and send the reply if we were provided one
assert (reply_p);
zmsg_t *reply = *reply_p;
assert (reply || !self->expect_reply);
if (reply) {
assert (self->reply_to);
zmsg_wrap (reply, self->reply_to);
s_mdwrk_send_to_broker (self, MDPW_REPLY, NULL, reply);
zmsg_destroy (reply_p);
}
self->expect_reply = 1;
while (true) {
zmq_pollitem_t items [] = {
{ self->worker, 0, ZMQ_POLLIN, 0 } };
int rc = zmq_poll (items, 1, self->heartbeat * ZMQ_POLL_MSEC);
if (rc == -1)
break; // Interrupted
if (items [0].revents & ZMQ_POLLIN) {
zmsg_t *msg = zmsg_recv (self->worker);
if (!msg)
break; // Interrupted
if (self->verbose) {
zclock_log ("I: received message from broker:");
zmsg_dump (msg);
}
self->liveness = HEARTBEAT_LIVENESS;
// Don't try to handle errors, just assert noisily
assert (zmsg_size (msg) >= 3);
zframe_t *empty = zmsg_pop (msg);
assert (zframe_streq (empty, ""));
zframe_destroy (&empty);
zframe_t *header = zmsg_pop (msg);
assert (zframe_streq (header, MDPW_WORKER));
zframe_destroy (&header);
zframe_t *command = zmsg_pop (msg);
if (zframe_streq (command, MDPW_REQUEST)) {
// We should pop and save as many addresses as there are
// up to a null part, but for now, just save one...
self->reply_to = zmsg_unwrap (msg);
zframe_destroy (&command);
// .split process message
// Here is where we actually have a message to process; we
// return it to the caller application:
return msg; // We have a request to process
}
elseif (zframe_streq (command, MDPW_HEARTBEAT))
; // Do nothing for heartbeats
elseif (zframe_streq (command, MDPW_DISCONNECT))
s_mdwrk_connect_to_broker (self);
else {
zclock_log ("E: invalid input message");
zmsg_dump (msg);
}
zframe_destroy (&command);
zmsg_destroy (&msg);
}
elseif (--self->liveness == 0) {
if (self->verbose)
zclock_log ("W: disconnected from broker - retrying...");
zclock_sleep (self->reconnect);
s_mdwrk_connect_to_broker (self);
}
// Send HEARTBEAT if it's time
if (zclock_time () > self->heartbeat_at) {
s_mdwrk_send_to_broker (self, MDPW_HEARTBEAT, NULL, NULL);
self->heartbeat_at = zclock_time () + self->heartbeat;
}
}
if (zctx_interrupted)
printf ("W: interrupt received, killing worker...\n");
returnNULL;
}
mdwrkapi: Majordomo worker API in C++
#ifndef __MDWRKAPI_HPP_INCLUDED__
#define __MDWRKAPI_HPP_INCLUDED__
#include"zmsg.hpp"#include"mdp.h"// Reliability parameters
// Structure of our class
// We access these properties only via class methods
classmdwrk {
public:
// ---------------------------------------------------------------------
// Constructor
mdwrk (std::string broker, std::string service, int verbose): m_broker(broker), m_service(service),m_verbose(verbose)
{
s_version_assert (4, 0);
m_context = new zmq::context_t (1);
s_catch_signals ();
connect_to_broker ();
}
// ---------------------------------------------------------------------
// Destructor
virtual
~mdwrk ()
{
delete m_worker;
delete m_context;
}
// ---------------------------------------------------------------------
// Send message to broker
// If no _msg is provided, creates one internally
voidsend_to_broker(constchar *command, std::string option, zmsg *_msg)
{
zmsg *msg = _msg? new zmsg(*_msg): new zmsg ();
// Stack protocol envelope to start of message
if (!option.empty()) {
msg->push_front (option.c_str());
}
msg->push_front (command);
msg->push_front (k_mdpw_worker.data());
msg->push_front ("");
if (m_verbose) {
s_console ("I: sending %s to broker",
mdps_commands [(int) *command].data());
msg->dump ();
}
msg->send (*m_worker);
delete msg;
}
// ---------------------------------------------------------------------
// Connect or reconnect to broker
voidconnect_to_broker ()
{
if (m_worker) {
delete m_worker;
}
m_worker = new zmq::socket_t (*m_context, ZMQ_DEALER);
int linger = 0;
m_worker->setsockopt (ZMQ_LINGER, &linger, sizeof (linger));
s_set_id(*m_worker);
m_worker->connect (m_broker.c_str());
if (m_verbose)
s_console ("I: connecting to broker at %s...", m_broker.c_str());
// Register service with broker
send_to_broker (k_mdpw_ready.data(), m_service, NULL);
// If liveness hits zero, queue is considered disconnected
m_liveness = n_heartbeat_liveness;
m_heartbeat_at = s_clock () + m_heartbeat;
}
// ---------------------------------------------------------------------
// Set heartbeat delay
voidset_heartbeat (int heartbeat)
{
m_heartbeat = heartbeat;
}
// ---------------------------------------------------------------------
// Set reconnect delay
voidset_reconnect (int reconnect)
{
m_reconnect = reconnect;
}
// ---------------------------------------------------------------------
// Send reply, if any, to broker and wait for next request.
zmsg *
recv (zmsg *&reply_p)
{
// Format and send the reply if we were provided one
zmsg *reply = reply_p;
assert (reply || !m_expect_reply);
if (reply) {
assert (m_reply_to.size()!=0);
reply->wrap (m_reply_to.c_str(), "");
m_reply_to = "";
send_to_broker (k_mdpw_reply.data(), "", reply);
delete reply_p;
reply_p = 0;
}
m_expect_reply = true;
while (!s_interrupted) {
zmq::pollitem_t items[] = {
{ *m_worker, 0, ZMQ_POLLIN, 0 } };
zmq::poll (items, 1, m_heartbeat);
if (items[0].revents & ZMQ_POLLIN) {
zmsg *msg = new zmsg(*m_worker);
if (m_verbose) {
s_console ("I: received message from broker:");
msg->dump ();
}
m_liveness = n_heartbeat_liveness;
// Don't try to handle errors, just assert noisily
assert (msg->parts () >= 3);
ustring empty = msg->pop_front ();
assert (empty.compare((unsignedchar *)"") == 0);
//assert (strcmp (empty, "") == 0);
//free (empty);
ustring header = msg->pop_front ();
assert (header.compare((unsignedchar *)k_mdpw_worker.data()) == 0);
//free (header);
std::string command =(char*) msg->pop_front ().c_str();
if (command.compare (k_mdpw_request.data()) == 0) {
// We should pop and save as many addresses as there are
// up to a null part, but for now, just save one...
m_reply_to = msg->unwrap ();
return msg; // We have a request to process
}
elseif (command.compare (k_mdpw_heartbeat.data()) == 0) {
// Do nothing for heartbeats
}
elseif (command.compare (k_mdpw_disconnect.data()) == 0) {
connect_to_broker ();
}
else {
s_console ("E: invalid input message (%d)",
(int) *(command.c_str()));
msg->dump ();
}
delete msg;
}
elseif (--m_liveness == 0) {
if (m_verbose) {
s_console ("W: disconnected from broker - retrying...");
}
s_sleep (m_reconnect);
connect_to_broker ();
}
// Send HEARTBEAT if it's time
if (s_clock () >= m_heartbeat_at) {
send_to_broker (k_mdpw_heartbeat.data(), "", NULL);
m_heartbeat_at += m_heartbeat;
}
}
if (s_interrupted)
printf ("W: interrupt received, killing worker...\n");
returnNULL;
}
private:
staticconstexpruint32_t n_heartbeat_liveness = 3;// 3-5 is reasonable
const std::string m_broker;
const std::string m_service;
zmq::context_t *m_context;
zmq::socket_t *m_worker{}; // Socket to broker
constint m_verbose; // Print activity to stdout
// Heartbeat management
int64_t m_heartbeat_at; // When to send HEARTBEAT
size_t m_liveness; // How many attempts left
int m_heartbeat{2500}; // Heartbeat delay, msecs
int m_reconnect{2500}; // Reconnect delay, msecs
// Internal state
bool m_expect_reply{false}; // Zero only at start
// Return address, if any
std::string m_reply_to;
};
#endif
---- mdwrkapi.lua - Majordomo Protocol Worker API---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--local HEARTBEAT_LIVENESS = 3-- 3-5 is reasonablelocal setmetatable = setmetatable
local mdp = require"mdp"local zmq = require"zmq"local zpoller = require"zmq.poller"local zmsg = require"zmsg"
require"zhelpers"local s_version_assert = s_version_assert
local obj_mt = {}
obj_mt.__index = obj_mt
functionobj_mt:set_heartbeat(heartbeat)
self.heartbeat = heartbeat
endfunctionobj_mt:set_reconnect(reconnect)
self.reconnect = reconnect
endfunctionobj_mt:destroy()
if self.worker then self.worker:close() end
self.context:term()
end-- Send message to broker-- If no msg is provided, create one internallylocalfunctions_mdwrk_send_to_broker(self, command, option, msg)
msg = msg or zmsg.new()
-- Stack protocol envelope to start of messageif option then
msg:push(option)
end
msg:push(command)
msg:push(mdp.MDPW_WORKER)
msg:push("")
if self.verbose then
s_console("I: sending %s to broker", mdp.mdps_commands[command])
msg:dump()
end
msg:send(self.worker)
endlocalfunctions_mdwrk_connect_to_broker(self)
-- close old socket.if self.worker then
self.poller:remove(self.worker)
self.worker:close()
end
self.worker = assert(self.context:socket(zmq.DEALER))
assert(self.worker:setopt(zmq.LINGER, 0))
assert(self.worker:connect(self.broker))
if self.verbose then
s_console("I: connecting to broker at %s...", self.broker)
end-- Register service with broker
s_mdwrk_send_to_broker(self, mdp.MDPW_READY, self.service)
-- If liveness hits zero, queue is considered disconnected
self.liveness = HEARTBEAT_LIVENESS
self.heartbeat_at = s_clock() + self.heartbeat
-- add socket to poller
self.poller:add(self.worker, zmq.POLLIN, function()
self.got_msg = trueend)
end---- Send reply, if any, to broker and wait for next request.--functionobj_mt:recv(reply)
-- Format and send the reply if we are provided oneif reply then
assert(self.reply_to)
reply:wrap(self.reply_to, "")
self.reply_to = nil
s_mdwrk_send_to_broker(self, mdp.MDPW_REPLY, nil, reply)
end
self.expect_reply = true
self.got_msg = falsewhiletruedolocal cnt = assert(self.poller:poll(self.heartbeat * 1000))
if cnt ~= 0and self.got_msg then
self.got_msg = falselocal msg = zmsg.recv(self.worker)
if self.verbose then
s_console("I: received message from broker:")
msg:dump()
end
self.liveness = HEARTBEAT_LIVENESS
-- Don't try to handle errors, just assert noisily
assert(msg:parts() >= 3)
local empty = msg:pop()
assert(empty == "")
local header = msg:pop()
assert(header == mdp.MDPW_WORKER)
local command = msg:pop()
if command == mdp.MDPW_REQUEST then-- We should pop and save as many addresses as there are-- up to a null part, but for now, just save one...
self.reply_to = msg:unwrap()
return msg -- We have a request to processelseif command == mdp.MDPW_HEARTBEAT then-- Do nothing for heartbeatselseif command == mdp.MDPW_DISCONNECT then-- dis-connect and re-connect to broker.
s_mdwrk_connect_to_broker(self)
else
s_console("E: invalid input message (%d)", command:byte(1,1))
msg:dump()
endelse
self.liveness = self.liveness - 1if (self.liveness == 0) thenif self.verbose then
s_console("W: disconnected from broker - retrying...")
end-- sleep then Reconnect
s_sleep(self.reconnect)
s_mdwrk_connect_to_broker(self)
end-- Send HEARTBEAT if it's timeif (s_clock() > self.heartbeat_at) then
s_mdwrk_send_to_broker(self, mdp.MDPW_HEARTBEAT)
self.heartbeat_at = s_clock() + self.heartbeat
endendendend
module(...)
functionnew(broker, service, verbose)
s_version_assert(2, 1);
local self = setmetatable({
context = zmq.init(1),
poller = zpoller.new(1),
broker = broker,
service = service,
verbose = verbose,
heartbeat = 2500, -- msecs
reconnect = 2500, -- msecs
}, obj_mt)
s_mdwrk_connect_to_broker(self)
return self
end
setmetatable(_M, { __call = function(self, ...) return new(...) end })
<?php/* =====================================================================
* mdwrkapi.php
*
* Majordomo Protocol Worker API
* Implements the MDP/Worker spec at http://rfc.zeromq.org/spec:7.
*/include_once'zmsg.php';
include_once'mdp.php';
// Reliability parameters
define("HEARTBEAT_LIVENESS", 3); // 3-5 is reasonable
// Structure of our class
// We access these properties only via class methods
classMDWrk
{
private$ctx; // Our context
private$broker;
private$service;
private$worker; // Socket to broker
private$verbose = false; // Print activity to stdout
// Heartbeat management
private$heartbeat_at; // When to send HEARTBEAT
private$liveness; // How many attempts left
private$heartbeat; // Heartbeat delay, msecs
private$reconnect; // Reconnect delay, msecs
// Internal state
private$expect_reply = 0;
// Return address, if any
private$reply_to;
/**
* Constructor
*
* @param string $broker
* @param string $service
* @param boolean $verbose
*/publicfunction __construct($broker, $service, $verbose = false)
{
$this->ctx = new ZMQContext();
$this->broker = $broker;
$this->service = $service;
$this->verbose = $verbose;
$this->heartbeat = 2500; // msecs
$this->reconnect = 2500; // msecs
$this->connect_to_broker();
}
/**
* Send message to broker
* If no msg is provided, creates one internally
*
* @param string $command
* @param string $option
* @param Zmsg $msg
*/publicfunctionsend_to_broker($command, $option, $msg = null)
{
$msg = $msg ? $msg : new Zmsg();
if ($option) {
$msg->push($option);
}
$msg->push($command);
$msg->push(MDPW_WORKER);
$msg->push("");
if ($this->verbose) {
printf("I: sending %s to broker %s", $command, PHP_EOL);
echo$msg->__toString();
}
$msg->set_socket($this->worker)->send();
}
/**
* Connect or reconnect to broker
*/publicfunctionconnect_to_broker()
{
$this->worker = new ZMQSocket($this->ctx, ZMQ::SOCKET_DEALER);
$this->worker->connect($this->broker);
if ($this->verbose) {
printf("I: connecting to broker at %s... %s", $this->broker, PHP_EOL);
}
// Register service with broker
$this->send_to_broker(MDPW_READY, $this->service, NULL);
// If liveness hits zero, queue is considered disconnected
$this->liveness = HEARTBEAT_LIVENESS;
$this->heartbeat_at = microtime(true) + ($this->heartbeat / 1000);
}
/**
* Set heartbeat delay
*
* @param int $heartbeat
*/publicfunctionset_heartbeat($heartbeat)
{
$this->heartbeat = $heartbeat;
}
/**
* Set reconnect delay
*
* @param int $reconnect
*/publicfunctionset_reconnect($reconnect)
{
$this->reconnect = $reconnect;
}
/**
* Send reply, if any, to broker and wait for next request.
*
* @param Zmsg $reply
* @return Zmsg Returns if there is a request to process
*/publicfunctionrecv($reply = null)
{
// Format and send the reply if we were provided one
assert ($reply || !$this->expect_reply);
if ($reply) {
$reply->wrap($this->reply_to);
$this->send_to_broker(MDPW_REPLY, NULL, $reply);
}
$this->expect_reply = true;
$read = $write = array();
while (true) {
$poll = new ZMQPoll();
$poll->add($this->worker, ZMQ::POLL_IN);
$events = $poll->poll($read, $write, $this->heartbeat);
if ($events) {
$zmsg = new Zmsg($this->worker);
$zmsg->recv();
if ($this->verbose) {
echo"I: received message from broker:", PHP_EOL;
echo$zmsg->__toString();
}
$this->liveness = HEARTBEAT_LIVENESS;
// Don't try to handle errors, just assert noisily
assert ($zmsg->parts() >= 3);
$zmsg->pop();
$header = $zmsg->pop();
assert($header == MDPW_WORKER);
$command = $zmsg->pop();
if ($command == MDPW_REQUEST) {
// We should pop and save as many addresses as there are
// up to a null part, but for now, just save one...
$this->reply_to = $zmsg->unwrap();
return$zmsg;// We have a request to process
} elseif ($command == MDPW_HEARTBEAT) {
// Do nothing for heartbeats
} elseif ($command == MDPW_DISCONNECT) {
$this->connect_to_broker();
} else {
echo"E: invalid input message", PHP_EOL;
echo$zmsg->__toString();
}
} elseif (--$this->liveness == 0) { // poll ended on timeout, $event being false
if ($this->verbose) {
echo"W: disconnected from broker - retrying...", PHP_EOL;
}
usleep($this->reconnect*1000);
$this->connect_to_broker();
}
// Send HEARTBEAT if it's time
if (microtime(true) > $this->heartbeat_at) {
$this->send_to_broker(MDPW_HEARTBEAT, NULL, NULL);
$this->heartbeat_at = microtime(true) + ($this->heartbeat/1000);
}
}
}
}
mdwrkapi: Majordomo worker API in Python
"""Majordomo Protocol Worker API, Python version
Implements the MDP/Worker spec at http:#rfc.zeromq.org/spec:7.
Author: Min RK <benjaminrk@gmail.com>
Based on Java example by Arkadiusz Orzechowski
"""importloggingimporttimeimportzmqfromzhelpersimport dump
# MajorDomo protocol constants:importMDPclassMajorDomoWorker(object):
"""Majordomo Protocol Worker API, Python version
Implements the MDP/Worker spec at http:#rfc.zeromq.org/spec:7.
"""
HEARTBEAT_LIVENESS = 3# 3-5 is reasonable
broker = None
ctx = None
service = None
worker = None # Socket to broker
heartbeat_at = 0# When to send HEARTBEAT (relative to time.time(), so in seconds)
liveness = 0# How many attempts left
heartbeat = 2500# Heartbeat delay, msecs
reconnect = 2500# Reconnect delay, msecs# Internal state
expect_reply = False # False only at start
timeout = 2500# poller timeout
verbose = False # Print activity to stdout# Return address, if any
reply_to = None
def __init__(self, broker, service, verbose=False):
self.broker = broker
self.service = service
self.verbose = verbose
self.ctx = zmq.Context()
self.poller = zmq.Poller()
logging.basicConfig(format="%(asctime)s%(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
level=logging.INFO)
self.reconnect_to_broker()
defreconnect_to_broker(self):
"""Connect or reconnect to broker"""if self.worker:
self.poller.unregister(self.worker)
self.worker.close()
self.worker = self.ctx.socket(zmq.DEALER)
self.worker.linger = 0
self.worker.connect(self.broker)
self.poller.register(self.worker, zmq.POLLIN)
if self.verbose:
logging.info("I: connecting to broker at %s...", self.broker)
# Register service with broker
self.send_to_broker(MDP.W_READY, self.service, [])
# If liveness hits zero, queue is considered disconnected
self.liveness = self.HEARTBEAT_LIVENESS
self.heartbeat_at = time.time() + 1e-3 * self.heartbeat
defsend_to_broker(self, command, option=None, msg=None):
"""Send message to broker.
If no msg is provided, creates one internally
"""if msg is None:
msg = []
elifnotisinstance(msg, list):
msg = [msg]
if option:
msg = [option] + msg
msg = [b'', MDP.W_WORKER, command] + msg
if self.verbose:
logging.info("I: sending %s to broker", command)
dump(msg)
self.worker.send_multipart(msg)
defrecv(self, reply=None):
"""Send reply, if any, to broker and wait for next request."""# Format and send the reply if we were provided oneassert reply isnot None ornot self.expect_reply
if reply isnot None:
assert self.reply_to isnot None
reply = [self.reply_to, b''] + reply
self.send_to_broker(MDP.W_REPLY, msg=reply)
self.expect_reply = True
while True:
# Poll socket for a reply, with timeouttry:
items = self.poller.poll(self.timeout)
except KeyboardInterrupt:
break# Interruptedif items:
msg = self.worker.recv_multipart()
if self.verbose:
logging.info("I: received message from broker: ")
dump(msg)
self.liveness = self.HEARTBEAT_LIVENESS
# Don't try to handle errors, just assert noisilyassertlen(msg) >= 3
empty = msg.pop(0)
assert empty == b''
header = msg.pop(0)
assert header == MDP.W_WORKER
command = msg.pop(0)
if command == MDP.W_REQUEST:
# We should pop and save as many addresses as there are# up to a null part, but for now, just save one...
self.reply_to = msg.pop(0)
# pop empty
empty = msg.pop(0)
assert empty == b''return msg # We have a request to processelif command == MDP.W_HEARTBEAT:
# Do nothing for heartbeatspasselif command == MDP.W_DISCONNECT:
self.reconnect_to_broker()
else :
logging.error("E: invalid input message: ")
dump(msg)
else:
self.liveness -= 1if self.liveness == 0:
if self.verbose:
logging.warn("W: disconnected from broker - retrying...")
try:
time.sleep(1e-3*self.reconnect)
except KeyboardInterrupt:
break
self.reconnect_to_broker()
# Send HEARTBEAT if it's timeif time.time() > self.heartbeat_at:
self.send_to_broker(MDP.W_HEARTBEAT)
self.heartbeat_at = time.time() + 1e-3*self.heartbeat
logging.warn("W: interrupt received, killing worker...")
return None
defdestroy(self):
# context.destroy depends on pyzmq >= 2.1.10
self.ctx.destroy(0)
#!/usr/bin/env ruby# Majordomo Protocol Worker API, Ruby version## Implements the MDP/Worker spec at http:#rfc.zeromq.org/spec:7.## Author: Tom van Leeuwen <tom@vleeuwen.eu># Based on Python example by Min RKrequire'ffi-rzmq'require'./mdp.rb'classMajorDomoWorkerHEARTBEAT_LIVENESS = 3# 3-5 is reasonabledefinitialize broker, service
@broker = broker
@service = service
@context = ZMQ::Context.new(1)
@poller = ZMQ::Poller.new
@worker = nil# Socket to broker
@heartbeat_at = 0# When to send HEARTBEAT (relative to time.time(), so in seconds)
@liveness = 0# How many attempts left
@timeout = 2500
@heartbeat = 2500# Heartbeat delay, msecs
@reconnect = 2500# Reconnect delay, msecs
@expect_reply = false# false only at start
@reply_to = nil
reconnect_to_broker
enddefrecv reply
if reply and @reply_to
reply = reply.is_a?(Array) ? [@reply_to, ''].concat(reply) : [@reply_to, '', reply]
send_to_broker MDP::W_REPLY, nil, reply
end
@expect_reply = trueloopdo
items = @poller.poll(@timeout)
if items
messages = []
@worker.recv_strings messages
@liveness = HEARTBEAT_LIVENESS
messages.shift # emptyif messages.shift != MDP::W_WORKERputs"E: Header is not MDP::WORKER"end
command = messages.shift
case command
whenMDP::W_REQUEST# We should pop and save as many addresses as there are# up to a null part, but for now, just save one...
@reply_to = messages.shift
messages.shift # emptyreturn messages # We have a request to processwhenMDP::W_HEARTBEAT# do nothingwhenMDP::W_DISCONNECT
reconnect_to_broker
elseendelse
@liveness -= 1if @liveness == 0sleep0.001*@reconnect
reconnect_to_broker
endendifTime.now > @heartbeat_at
send_to_broker MDP::W_HEARTBEAT
@heartbeat_at = Time.now + 0.001 * @heartbeat
endendenddefreconnect_to_brokerif @worker
@poller.deregister @worker, ZMQ::DEALER
@worker.close
end
@worker = @context.socket ZMQ::DEALER
@worker.setsockopt ZMQ::LINGER, 0
@worker.connect @broker
@poller.register @worker, ZMQ::POLLIN
send_to_broker(MDP::W_READY, @service, [])
@liveness = HEARTBEAT_LIVENESS
@heartbeat_at = Time.now + 0.001 * @heartbeat
enddefsend_to_broker command, option=nil, message=nil# if no message is provided, create on internallyif message.nil?
message = []
elsifnot message.is_a?(Array)
message = [message]
end
message = [option].concat message if option
message = ['', MDP::W_WORKER, command].concat message
@worker.send_strings message
endend
# Majordomo Protocol Worker API, Tcl version.
# Implements the MDP/Worker spec at http://rfc.zeromq.org/spec:7.
package require TclOO
package require zmq
package require mdp
package provide MDWorker 1.0oo::class create MDWorker {variable context broker service worker verbose heartbeat_at liveness heartbeat reconnect expect_reply reply_to
constructor{ibroker iservice {iverbose}}{set context [zmq context mdwrk_context_[::mdp::contextid]]set broker $ibrokerset service $iserviceset verbose $iverboseset heartbeat 2500set reconnect 2500set expect_reply 0set reply_to ""set worker ""my connect_to_broker
}destructor{$workerclose$contextterm}# Send message to broker
method send_to_broker {command option msg}{# Stack protocol envelope to start of message
if{[string length $option]}{set msg [zmsg push $msg$option]}set msg [zmsg push $msg$::mdp::MDPW_COMMAND($command)]set msg [zmsg push $msg$::mdp::MDPW_WORKER]set msg [zmsg push $msg""]if{$verbose}{puts"I: sending $command to broker"puts[join[zmsg dump $msg]\n]}zmsg send $worker$msg}# Connect or reconnect to broker
method connect_to_broker {}{if{[string length $worker]}{$workerclose}set worker [zmq socket mdwrk_socket_[::mdp::socketid]$context DEALER]$workerconnect$brokerif{$verbose}{puts"I: connecting to broker at $broker..."}# Register service with broker
my send_to_broker READY $service{}# If liveness hits zero, queue is considered disconnected
set liveness $::mdp::HEARTBEAT_LIVENESSset heartbeat_at [expr{[clock milliseconds] + $heartbeat}]}# Set heartbeat delay
method set_heartbeat {iheartbeat}{set heartbeat $iheartbeat}# Set reconnect delay
method set_reconnect {ireconnect}{set reconnect $ireconnect}# Send reply, if any, to broker and wait for next request.
method recv {reply}{# Format and send the reply if we were provided one
if{!([string length $reply] || !$expect_reply)}{error"reply expected"}if{[string length $reply]}{if{![string length $reply_to]}{error"no reply_to found"}set reply [zmsg wrap $reply$reply_to]my send_to_broker REPLY {}$reply}set expect_reply 1while{1}{set poll_set [list[list$worker[list POLLIN]]]set rpoll_set [zmq poll $poll_set$heartbeat]if{[llength$rpoll_set] && "POLLIN"in[lindex$rpoll_set01]}{set msg [zmsg recv $worker]if{$verbose}{puts"I: received message from broker:"puts[join[zmsg dump $msg]\n]}set liveness $::mdp::HEARTBEAT_LIVENESS# Don't try to handle errors, just assert noisily
if{[llength$msg] < 3}{error"invalid message size"}set empty [zmsg pop msg]if{[string length $empty]}{error"expected empty frame"}set header [zmsg pop msg]if{$headerne$mdp::MDPW_WORKER}{error"unexpected header"}set command [zmsg pop msg]if{$commandeq$::mdp::MDPW_COMMAND(REQUEST)}{# We should pop and save as many addresses as there are
# up to a null part, but for now, just save one…
set reply_to [zmsg unwrap msg]return$msg;# We have a request to process
}elseif{$commandeq$mdp::MDPW_COMMAND(HEARTBEAT)}{;# Do nothing for heartbeats
}elseif{$commandeq$mdp::MDPW_COMMAND(DISCONNECT)}{my connect_to_broker
}else{puts"E: invalid input message"puts[join[zmsg dump $msg]\n]}}elseif{[incr liveness -1] == 0}{if{$verbose}{puts"W: disconnected from broker - retrying..."}after$reconnectmy connect_to_broker
}# Send HEARTBEAT if it's time
if{[clock milliseconds] > $heartbeat_at}{my send_to_broker HEARTBEAT {}{}set heartbeat_at [expr{[clock milliseconds] + $heartbeat}]}}}}
// Majordomo Protocol worker example
// Uses the mdwrk API to hide all MDP aspects
//
// To run this example, you may need to run multiple *.go files as below
// go run mdp.go zhelpers.go mdwrkapi.go mdworker.go [-v]
//
// Author: iano <scaly.iano@gmail.com>
package main
import (
"os"
)
funcmain() {
verbose := len(os.Args) >= 2 && os.Args[1] == "-v"
worker := NewWorker("tcp://localhost:5555", "echo", verbose)
for reply := [][]byte{}; ; {
request := worker.Recv(reply)
iflen(request) == 0 {
break
}
reply = request
}
}
Here are some things to note about the worker API code:
The APIs are single-threaded. This means, for example, that the worker won’t send heartbeats in the background. Happily, this is exactly what we want: if the worker application gets stuck, heartbeats will stop and the broker will stop sending requests to the worker.
The worker API doesn’t do an exponential back-off; it’s not worth the extra complexity.
The APIs don’t do any error reporting. If something isn’t as expected, they raise an assertion (or exception depending on the language). This is ideal for a reference implementation, so any protocol errors show immediately. For real applications, the API should be robust against invalid messages.
You might wonder why the worker API is manually closing its socket and opening a new one, when ZeroMQ will automatically reconnect a socket if the peer disappears and comes back. Look back at the Simple Pirate and Paranoid Pirate workers to understand. Although ZeroMQ will automatically reconnect workers if the broker dies and comes back up, this isn’t sufficient to re-register the workers with the broker. I know of at least two solutions. The simplest, which we use here, is for the worker to monitor the connection using heartbeats, and if it decides the broker is dead, to close its socket and start afresh with a new socket. The alternative is for the broker to challenge unknown workers when it gets a heartbeat from the worker and ask them to re-register. That would require protocol support.
Now let’s design the Majordomo broker. Its core structure is a set of queues, one per service. We will create these queues as workers appear (we could delete them as workers disappear, but forget that for now because it gets complex). Additionally, we keep a queue of workers per service.
---- Majordomo Protocol broker-- A minimal implementation of http://rfc.zeromq.org/spec:7 and spec:8---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"zmq"
require"zmq.poller"
require"zmsg"
require"zhelpers"
require"mdp"local tremove = table.remove
-- We'd normally pull these from config datalocal HEARTBEAT_LIVENESS = 3-- 3-5 is reasonablelocal HEARTBEAT_INTERVAL = 2500-- msecslocal HEARTBEAT_EXPIRY = HEARTBEAT_INTERVAL * HEARTBEAT_LIVENESS
-- ----------------------------------------------------------------------- Constructor for broker object-- ----------------------------------------------------------------------- Broker object's metatable.local broker_mt = {}
broker_mt.__index = broker_mt
functionbroker_new(verbose)
local context = zmq.init(1)
-- Initialize broker statereturn setmetatable({
context = context,
socket = context:socket(zmq.ROUTER),
verbose = verbose,
services = {},
workers = {},
waiting = {},
heartbeat_at = s_clock() + HEARTBEAT_INTERVAL,
}, broker_mt)
end-- ----------------------------------------------------------------------- Service objectlocal service_mt = {}
service_mt.__index = service_mt
-- Worker objectlocal worker_mt = {}
worker_mt.__index = worker_mt
-- helper list remove functionlocalfunctionzlist_remove(list, item)
for n=#list,1,-1doif list[n] == item then
tremove(list, n)
endendend-- ----------------------------------------------------------------------- Destructor for broker objectfunctionbroker_mt:destroy()
self.socket:close()
self.context:term()
for name, service in pairs(self.services) do
service:destroy()
endfor id, worker in pairs(self.workers) do
worker:destroy()
endend-- ----------------------------------------------------------------------- Bind broker to endpoint, can call this multiple times-- We use a single socket for both clients and workers.functionbroker_mt:bind(endpoint)
self.socket:bind(endpoint)
s_console("I: MDP broker/0.1.1 is active at %s", endpoint)
end-- ----------------------------------------------------------------------- Delete any idle workers that haven't pinged us in a while.functionbroker_mt:purge_workers()
local waiting = self.waiting
for n=1,#waiting dolocal worker = waiting[n]
if (worker:expired()) thenif (self.verbose) then
s_console("I: deleting expired worker: %s", worker.identity)
end
self:worker_delete(worker, false)
endendend-- ----------------------------------------------------------------------- Locate or create new service entryfunctionbroker_mt:service_require(name)
assert (name)
local service = self.services[name]
ifnot service then
service = setmetatable({
name = name,
requests = {},
waiting = {},
workers = 0,
}, service_mt)
self.services[name] = service
if (self.verbose) then
s_console("I: received message:")
endendreturn service
end-- ----------------------------------------------------------------------- Destroy service object, called when service is removed from-- broker.services.functionservice_mt:destroy()
end-- ----------------------------------------------------------------------- Dispatch requests to waiting workers as possiblefunctionbroker_mt:service_dispatch(service, msg)
assert (service)
local requests = service.requests
if (msg) then-- Queue message if any
requests[#requests + 1] = msg
end
self:purge_workers()
local waiting = service.waiting
while (#waiting > 0and #requests > 0) dolocal worker = tremove(waiting, 1) -- pop worker from service's waiting queue.
zlist_remove(self.waiting, worker) -- also remove worker from broker's waiting queue.local msg = tremove(requests, 1) -- pop request from service's request queue.
self:worker_send(worker, mdp.MDPW_REQUEST, nil, msg)
endend-- ----------------------------------------------------------------------- Handle internal service according to 8/MMI specificationfunctionbroker_mt:service_internal(service_name, msg)
if (service_name == "mmi.service") thenlocal name = msg:body()
local service = self.services[name]
if (service and service.workers) then
msg:body_set("200")
else
msg:body_set("404")
endelse
msg:body_set("501")
end-- Remove & save client return envelope and insert the-- protocol header and service name, then rewrap envelope.local client = msg:unwrap()
msg:wrap(mdp.MDPC_CLIENT, service_name)
msg:wrap(client, "")
msg:send(self.socket)
end-- ----------------------------------------------------------------------- Creates worker if necessaryfunctionbroker_mt:worker_require(identity)
assert (identity)
-- self.workers is keyed off worker identitylocal worker = self.workers[identity]
if (not worker) then
worker = setmetatable({
identity = identity,
expiry = 0,
}, worker_mt)
self.workers[identity] = worker
if (self.verbose) then
s_console("I: registering new worker: %s", identity)
endendreturn worker
end-- ----------------------------------------------------------------------- Deletes worker from all data structures, and destroys workerfunctionbroker_mt:worker_delete(worker, disconnect)
assert (worker)
if (disconnect) then
self:worker_send(worker, mdp.MDPW_DISCONNECT)
endlocal service = worker.service
if (service) then
zlist_remove (service.waiting, worker)
service.workers = service.workers - 1end
zlist_remove (self.waiting, worker)
self.workers[worker.identity] = nil
worker:destroy()
end-- ----------------------------------------------------------------------- Destroy worker object, called when worker is removed from-- broker.workers.functionworker_mt:destroy(argument)
end-- ----------------------------------------------------------------------- Process message sent to us by a workerfunctionbroker_mt:worker_process(sender, msg)
assert (msg:parts() >= 1) -- At least, commandlocal command = msg:pop()
local worker_ready = (self.workers[sender] ~= nil)
local worker = self:worker_require(sender)
if (command == mdp.MDPW_READY) thenif (worker_ready) then-- Not first command in session then
self:worker_delete(worker, true)
elseif (sender:sub(1,4) == "mmi.") then-- Reserved service name
self:worker_delete(worker, true)
else-- Attach worker to service and mark as idlelocal service_name = msg:pop()
local service = self:service_require(service_name)
worker.service = service
service.workers = service.workers + 1
self:worker_waiting(worker)
endelseif (command == mdp.MDPW_REPLY) thenif (worker_ready) then-- Remove & save client return envelope and insert the-- protocol header and service name, then rewrap envelope.local client = msg:unwrap()
msg:wrap(mdp.MDPC_CLIENT, worker.service.name)
msg:wrap(client, "")
msg:send(self.socket)
self:worker_waiting(worker)
else
self:worker_delete(worker, true)
endelseif (command == mdp.MDPW_HEARTBEAT) thenif (worker_ready) then
worker.expiry = s_clock() + HEARTBEAT_EXPIRY
else
self:worker_delete(worker, true)
endelseif (command == mdp.MDPW_DISCONNECT) then
self:worker_delete(worker, false)
else
s_console("E: invalid input message (%d)", command:byte(1,1))
msg:dump()
endend-- ----------------------------------------------------------------------- Send message to worker-- If pointer to message is provided, sends & destroys that messagefunctionbroker_mt:worker_send(worker, command, option, msg)
msg = msg and msg:dup() or zmsg.new()
-- Stack protocol envelope to start of messageif (option) then-- Optional frame after command
msg:push(option)
end
msg:push(command)
msg:push(mdp.MDPW_WORKER)
-- Stack routing envelope to start of message
msg:wrap(worker.identity, "")
if (self.verbose) then
s_console("I: sending %s to worker", mdp.mdps_commands[command])
msg:dump()
end
msg:send(self.socket)
end-- ----------------------------------------------------------------------- This worker is now waiting for workfunctionbroker_mt:worker_waiting(worker)
-- Queue to broker and service waiting lists
self.waiting[#self.waiting + 1] = worker
worker.service.waiting[#worker.service.waiting + 1] = worker
worker.expiry = s_clock() + HEARTBEAT_EXPIRY
self:service_dispatch(worker.service, nil)
end-- ----------------------------------------------------------------------- Return 1 if worker has expired and must be deletedfunctionworker_mt:expired()
return (self.expiry < s_clock())
end-- ----------------------------------------------------------------------- Process a request coming from a clientfunctionbroker_mt:client_process(sender, msg)
assert (msg:parts() >= 2) -- Service name + bodylocal service_name = msg:pop()
local service = self:service_require(service_name)
-- Set reply return address to client sender
msg:wrap(sender, "")
if (service_name:sub(1,4) == "mmi.") then
self:service_internal(service_name, msg)
else
self:service_dispatch(service, msg)
endend-- ----------------------------------------------------------------------- Main broker work happens herelocal verbose = (arg[1] == "-v")
s_version_assert (2, 1)
s_catch_signals ()
local self = broker_new(verbose)
self:bind("tcp://*:5555")
local poller = zmq.poller.new(1)
-- Process next input message, if any
poller:add(self.socket, zmq.POLLIN, function()
local msg = zmsg.recv(self.socket)
if (self.verbose) then
s_console("I: received message:")
msg:dump()
endlocal sender = msg:pop()
local empty = msg:pop()
local header = msg:pop()
if (header == mdp.MDPC_CLIENT) then
self:client_process(sender, msg)
elseif (header == mdp.MDPW_WORKER) then
self:worker_process(sender, msg)
else
s_console("E: invalid message:")
msg:dump()
endend)
-- Get and process messages forever or until interruptedwhile (not s_interrupted) dolocal cnt = assert(poller:poll(HEARTBEAT_INTERVAL * 1000))
-- Disconnect and delete any expired workers-- Send heartbeats to idle workers if neededif (s_clock() > self.heartbeat_at) then
self:purge_workers()
local waiting = self.waiting
for n=1,#waiting dolocal worker = waiting[n]
self:worker_send(worker, mdp.MDPW_HEARTBEAT)
end
self.heartbeat_at = s_clock() + HEARTBEAT_INTERVAL
endendif (s_interrupted) then
printf("W: interrupt received, shutting down...\n")
end
self:destroy()
## Majordomo Protocol broker
# A minimal implementation of http://rfc.zeromq.org/spec:7 and spec:8
#
lappend auto_path .
package require TclOO
package require zmq
package require mdp
lappend auto_path .
set verbose 0foreach{k v}$argv{if{$keq"-v"}{set verbose 1}}oo::class create MDBroker {variable ctx socket verbose services workers waiting heartbeat_at endpoint
constructor{{iverbose0}}{set ctx [zmq context mdbroker_context_[::mdp::contextid]]set socket [zmq socket mdbroker_socket_[::mdp::socketid]$ctx ROUTER]set verbose $iverbose# services -> array
# workers -> array
set waiting [list]set heartbeat_at [expr{[clock milliseconds] + $::mdp::HEARTBEAT_INTERVAL}]set endpoint ""}destructor{foreach{k v}[array get services]{$vdestroy}foreach{k v}[array get workers]{$vdestroy}$socketclose$ctxterm}# Bind broker to endpoint, can call this multiple times
# We use a single socket for both clients and workers.
method bind {iendpoint}{set endpoint $iendpoint$socketbind$endpointif{$verbose}{puts"I: MDP broker is active at $endpoint"}}# Delete any idle workers that haven't pinged us in a while.
# We know that workers are ordered from oldest to most recent.
method purge_workers {}{set i 0foreach worker $waiting{if{[clock milliseconds] < [$workerexpiry]}{break;# Worker is alive, we're done here
}my worker_delete $worker0incr i
}set waiting [lrange$waiting$i end]}# Send heartbeat request to all workers
method heartbeat_workers {}{foreach worker $waiting{my worker_send $worker HEARTBEAT {}{}}set heartbeat_at [expr{[clock milliseconds] + $::mdp::HEARTBEAT_INTERVAL}]}# Locate or create new service entry
method service_require {name}{if{![info exists services($name)]}{set services($name)[MDBrokerService new $name]if{$verbose}{puts"I: added service: $name"}}return$services($name)}# Dispatch requests to waiting workers as possible
method service_dispatch {service{msg{}}}{if{[llength$msg]}{$serviceadd_request$msg}my purge_workers
while{[$serviceserviceable]}{lassign[$servicepop_worker_and_request] worker msg
set idx [lsearch$waiting$worker]if{$idx >= 0}{set waiting [lreplace$waiting$idx$idx]}my worker_send $worker REQUEST {}$msg}}# Handle internal service according to 8/MMI specification
method service_internal {service_frame msg}{if{$service_frameeq"mmi.service"}{if{[info exists services([lindex$msg end])] && [$services([lindex$msg end])has_workers]}{set return_code 200}else{set return_code 404}}else{set return_code 501}lset msg end $return_codemy rewrap_and_send $msg$service_frame}# Creates worker if necessary
method worker_require {address}{set identity [zmq zframe_strhex $address]if{![info exists workers($identity)]}{set workers($identity)[MDBrokerWorker new $address$identity]if{$verbose}{puts"I: registering new worker: $identity"}}return$workers($identity)}# Deletes worker from all data structures, and destroys worker
method worker_delete {worker disconnect}{if{$disconnect}{my worker_send $worker DISCONNECT {}{}}if{[$workerhas_service]}{$workerremove_from_service}set idx [lsearch$waiting$worker]if{$idx >= 0}{set waiting [lreplace$waiting$idx$idx]}unset workers([$workeridentity])$workerdestroy}method rewrap_and_send {msg service_frame}{# Remove & save client return envelope and insert the
# protocol header and service name, then rewrap envelope.
set client [zmsg unwrap msg]set msg [zmsg push $msg$service_frame]set msg [zmsg push $msg$::mdp::MDPC_CLIENT]set msg [zmsg wrap $msg$client]zmsg send $socket$msg}# Process message sent to us by a worker
method worker_process {sender msg}{if{[llength$msg] < 1}{error"Invalid message, need at least command"}set command [zmsg pop msg]set identity [zmq zframe_strhex $sender]set worker_ready [info exists workers($identity)]set worker [my worker_require $sender]if{$commandeq$::mdp::MDPW_COMMAND(READY)}{if{$worker_ready}{# Not first command in session
my worker_delete $worker1}elseif{[string match "mmi.*"$sender]}{# Reserved service name
my worker_delete $worker1}else{# Attach worker to service and mark as idle
set service_frame [zmsg pop msg]$workerset_service[my service_require $service_frame]my worker_waiting $worker}}elseif{$commandeq$::mdp::MDPW_COMMAND(REPLY)}{if{$worker_ready}{my rewrap_and_send $msg[[$workerservice]name]my worker_waiting $worker}else{my worker_delete $worker1}}elseif{$commandeq$::mdp::MDPW_COMMAND(HEARTBEAT)}{if{$worker_ready}{$workerupdate_expiry}else{my worker_delete $worker1}}elseif{$commandeq$::mdp::MDPW_COMMAND(DISCONNECT)}{my worker_delete $worker0}else{puts"E: invalid input message"puts[join[zmsg dump $msg]\n]}}# Send message to worker
# If pointer to message is provided, sends that message. Does not
# destroy the message, this is the caller's job.
method worker_send {worker command option msg}{# Stack protocol envelope to start of message
if{[string length $option]}{set msg [zmsg push $msg$option]}set msg [zmsg push $msg$::mdp::MDPW_COMMAND($command)]set msg [zmsg push $msg$::mdp::MDPW_WORKER]# Stack routing envelope to start of message
set msg [zmsg wrap $msg[$workeraddress]]if{$verbose}{puts"I: sending $command to worker"puts[join[zmsg dump $msg]\n]}zmsg send $socket$msg}# This worker is now waiting for work
method worker_waiting {worker}{lappend waiting $worker$workeradd_to_servicemy service_dispatch [$workerservice]}# Process a request coming from a client
method client_process {sender msg}{if{[llength$msg] < 2}{error"Invalud message, need name + body"}set service_frame [zmsg pop msg]set service [my service_require $service_frame]# Set reply return address to client sender
set msg [zmsg wrap $msg$sender]if{[string match "mmi.*"$service_frame]}{my service_internal $service_frame$msg}else{my service_dispatch $service$msg}}method socket {}{return$socket}method heartbeat_at {}{return$heartbeat_at}method verbose {}{return$verbose}}oo::class create MDBrokerService {variable name requests waiting
constructor{iname}{set name $inameset requests [list]set waiting [list]}destructor{}method serviceable {}{return[expr{[llength$waiting] && [llength$requests]}]}method add_request {msg}{lappend requests $msg}method has_workers {}{return[llength$waiting]}method pop_worker_and_request {}{set waiting [lassign$waiting worker]set requests [lassign$requests msg]return[list$worker$msg]}method add_worker {worker}{lappend waiting $worker}method remove_worker {worker}{set idx [lsearch$waiting$worker]if{$idx >= 0}{set waiting [lreplace$waiting$idx$idx]}}method name {}{return$name}}oo::class create MDBrokerWorker {variable identity address service expiry
constructor{iaddress iidentity}{set address $iaddressset identity $iidentityset service ""set expiry 0}destructor{}method has_service {}{return[string length $service]}method service {}{return$service}method expiry {}{return$expiry}method address {}{return$address}method identity {}{return$identity}method set_service {iservice}{set service $iservice}method remove_from_service {}{$serviceremove_worker[self]set service ""}method add_to_service {}{$serviceadd_worker[self]my update_expiry
}method update_expiry {}{set expiry [expr{[clock milliseconds] + $::mdp::HEARTBEAT_EXPIRY}]}}set broker [MDBroker new $verbose]$brokerbind"tcp://*:5555"# Get and process messages forever
while{1}{set poll_set [list[list[$brokersocket][list POLLIN]]]set rpoll_set [zmq poll $poll_set$::mdp::HEARTBEAT_INTERVAL]# Process next input message, if any
if{[llength$rpoll_set] && "POLLIN"in[lindex$rpoll_set01]}{set msg [zmsg recv [$brokersocket]]if{[$brokerverbose]}{puts"I: received message:"puts[join[zmsg dump $msg]\n]}set sender [zmsg pop msg]set empty [zmsg pop msg]set header [zmsg pop msg]if{$headereq$::mdp::MDPC_CLIENT}{$brokerclient_process$sender$msg}elseif{$headereq$::mdp::MDPW_WORKER}{$brokerworker_process$sender$msg}else{puts"E: invalid message:"puts[join[zmsg dump $msg]\n]}}# Disconnect and delete any expired workers
# Send heartbeats to idle workers if needed
if{[clock milliseconds] > [$brokerheartbeat_at]}{$brokerpurge_workers$brokerheartbeat_workers}}$brokerdestroy
This is by far the most complex example we’ve seen. It’s almost 500 lines of code. To write this and make it somewhat robust took two days. However, this is still a short piece of code for a full service-oriented broker.
Here are some things to note about the broker code:
The Majordomo Protocol lets us handle both clients and workers on a single socket. This is nicer for those deploying and managing the broker: it just sits on one ZeroMQ endpoint rather than the two that most proxies need.
The broker implements all of MDP/0.1 properly (as far as I know), including disconnection if the broker sends invalid commands, heartbeating, and the rest.
It can be extended to run multiple threads, each managing one socket and one set of clients and workers. This could be interesting for segmenting large architectures. The C code is already organized around a broker class to make this trivial.
A primary/failover or live/live broker reliability model is easy, as the broker essentially has no state except service presence. It’s up to clients and workers to choose another broker if their first choice isn’t up and running.
The examples use five-second heartbeats, mainly to reduce the amount of output when you enable tracing. Realistic values would be lower for most LAN applications. However, any retry has to be slow enough to allow for a service to restart, say 10 seconds at least.
We later improved and extended the protocol and the Majordomo implementation, which now sits in its own Github project. If you want a properly usable Majordomo stack, use the GitHub project.
The Majordomo implementation in the previous section is simple and stupid. The client is just the original Simple Pirate, wrapped up in a sexy API. When I fire up a client, broker, and worker on a test box, it can process 100,000 requests in about 14 seconds. That is partially due to the code, which cheerfully copies message frames around as if CPU cycles were free. But the real problem is that we’re doing network round-trips. ZeroMQ disables
Nagle’s algorithm, but round-tripping is still slow.
Theory is great in theory, but in practice, practice is better. Let’s measure the actual cost of round-tripping with a simple test program. This sends a bunch of messages, first waiting for a reply to each message, and second as a batch, reading all the replies back as a batch. Both approaches do the same work, but they give very different results. We mock up a client, broker, and worker:
---- Round-trip demonstrator---- While this example runs in a single process, that is just to make-- it easier to start and stop the example. Each thread has its own-- context and conceptually acts as a separate process.---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"zmq"
require"zmq.threads"
require"zmsg"local common_code = [[
require"zmq"
require"zmsg"
require"zhelpers"
]]local client_task = common_code .. [[
local context = zmq.init(1)
local client = context:socket(zmq.DEALER)
client:setopt(zmq.IDENTITY, "C", 1)
client:connect("tcp://localhost:5555")
printf("Setting up test...\n")
s_sleep(100)
local requests
local start
printf("Synchronous round-trip test...\n")
requests = 10000
start = s_clock()
for n=1,requests do
local msg = zmsg.new("HELLO")
msg:send(client)
msg = zmsg.recv(client)
end
printf(" %d calls/second\n",
(1000 * requests) / (s_clock() - start))
printf("Asynchronous round-trip test...\n")
requests = 100000
start = s_clock()
for n=1,requests do
local msg = zmsg.new("HELLO")
msg:send(client)
end
for n=1,requests do
local msg = zmsg.recv(client)
end
printf(" %d calls/second\n",
(1000 * requests) / (s_clock() - start))
client:close()
context:term()
]]local worker_task = common_code .. [[
local context = zmq.init(1)
local worker = context:socket(zmq.DEALER)
worker:setopt(zmq.IDENTITY, "W", 1)
worker:connect("tcp://localhost:5556")
while true do
local msg = zmsg.recv(worker)
msg:send(worker)
end
worker:close()
context:term()
]]local broker_task = common_code .. [[
-- Prepare our context and sockets
local context = zmq.init(1)
local frontend = context:socket(zmq.ROUTER)
local backend = context:socket(zmq.ROUTER)
frontend:bind("tcp://*:5555")
backend:bind("tcp://*:5556")
require"zmq.poller"
local poller = zmq.poller(2)
poller:add(frontend, zmq.POLLIN, function()
local msg = zmsg.recv(frontend)
--msg[1] = "W"
msg:pop()
msg:push("W")
msg:send(backend)
end)
poller:add(backend, zmq.POLLIN, function()
local msg = zmsg.recv(backend)
--msg[1] = "C"
msg:pop()
msg:push("C")
msg:send(frontend)
end)
poller:start()
frontend:close()
backend:close()
context:term()
]]
s_version_assert(2, 1)
local client = zmq.threads.runstring(nil, client_task)
assert(client:start())
local worker = zmq.threads.runstring(nil, worker_task)
assert(worker:start(true))
local broker = zmq.threads.runstring(nil, broker_task)
assert(broker:start(true))
assert(client:join())
Note that the client thread does a small pause before starting. This is to get around one of the “features” of the router socket: if you send a message with the address of a peer that’s not yet connected, the message gets discarded. In this example we don’t use the load balancing mechanism, so without the sleep, if the worker thread is too slow to connect, it will lose messages, making a mess of our test.
As we see, round-tripping in the simplest case is 20 times slower than the asynchronous, “shove it down the pipe as fast as it’ll go” approach. Let’s see if we can apply this to Majordomo to make it faster.
First, we modify the client API to send and receive in two separate methods:
// mdcliapi2 class - Majordomo Protocol Client API
// Implements the MDP/Worker spec at http://rfc.zeromq.org/spec:7.
#include"mdcliapi2.h"// Structure of our class
// We access these properties only via class methods
struct _mdcli_t {
zctx_t *ctx; // Our context
char *broker;
void *client; // Socket to broker
int verbose; // Print activity to stdout
int timeout; // Request timeout
};
// Connect or reconnect to broker. In this asynchronous class we use a
// DEALER socket instead of a REQ socket; this lets us send any number
// of requests without waiting for a reply.
voids_mdcli_connect_to_broker (mdcli_t *self)
{
if (self->client)
zsocket_destroy (self->ctx, self->client);
self->client = zsocket_new (self->ctx, ZMQ_DEALER);
zmq_connect (self->client, self->broker);
if (self->verbose)
zclock_log ("I: connecting to broker at %s...", self->broker);
}
// The constructor and destructor are the same as in mdcliapi, except
// we don't do retries, so there's no retries property.
// .skip
// ---------------------------------------------------------------------
// Constructor
mdcli_t *
mdcli_new (char *broker, int verbose)
{
assert (broker);
mdcli_t *self = (mdcli_t *) zmalloc (sizeof (mdcli_t));
self->ctx = zctx_new ();
self->broker = strdup (broker);
self->verbose = verbose;
self->timeout = 2500; // msecs
s_mdcli_connect_to_broker (self);
return self;
}
// Destructor
voidmdcli_destroy (mdcli_t **self_p)
{
assert (self_p);
if (*self_p) {
mdcli_t *self = *self_p;
zctx_destroy (&self->ctx);
free (self->broker);
free (self);
*self_p = NULL;
}
}
// Set request timeout
voidmdcli_set_timeout (mdcli_t *self, int timeout)
{
assert (self);
self->timeout = timeout;
}
// .until
// .skip
// The send method now just sends one message, without waiting for a
// reply. Since we're using a DEALER socket we have to send an empty
// frame at the start, to create the same envelope that the REQ socket
// would normally make for us:
intmdcli_send (mdcli_t *self, char *service, zmsg_t **request_p)
{
assert (self);
assert (request_p);
zmsg_t *request = *request_p;
// Prefix request with protocol frames
// Frame 0: empty (REQ emulation)
// Frame 1: "MDPCxy" (six bytes, MDP/Client x.y)
// Frame 2: Service name (printable string)
zmsg_pushstr (request, service);
zmsg_pushstr (request, MDPC_CLIENT);
zmsg_pushstr (request, "");
if (self->verbose) {
zclock_log ("I: send request to '%s' service:", service);
zmsg_dump (request);
}
zmsg_send (&request, self->client);
return0;
}
// .skip
// The recv method waits for a reply message and returns that to the
// caller.
// ---------------------------------------------------------------------
// Returns the reply message or NULL if there was no reply. Does not
// attempt to recover from a broker failure, this is not possible
// without storing all unanswered requests and resending them all...
zmsg_t *
mdcli_recv (mdcli_t *self)
{
assert (self);
// Poll socket for a reply, with timeout
zmq_pollitem_t items [] = { { self->client, 0, ZMQ_POLLIN, 0 } };
int rc = zmq_poll (items, 1, self->timeout * ZMQ_POLL_MSEC);
if (rc == -1)
returnNULL; // Interrupted
// If we got a reply, process it
if (items [0].revents & ZMQ_POLLIN) {
zmsg_t *msg = zmsg_recv (self->client);
if (self->verbose) {
zclock_log ("I: received reply:");
zmsg_dump (msg);
}
// Don't try to handle errors, just assert noisily
assert (zmsg_size (msg) >= 4);
zframe_t *empty = zmsg_pop (msg);
assert (zframe_streq (empty, ""));
zframe_destroy (&empty);
zframe_t *header = zmsg_pop (msg);
assert (zframe_streq (header, MDPC_CLIENT));
zframe_destroy (&header);
zframe_t *service = zmsg_pop (msg);
zframe_destroy (&service);
return msg; // Success
}
if (zctx_interrupted)
printf ("W: interrupt received, killing client...\n");
elseif (self->verbose)
zclock_log ("W: permanent error, abandoning request");
returnNULL;
}
mdcliapi2: Majordomo asynchronous client API in C++
#ifndef __MDCLIAPI_HPP_INCLUDED__
#define __MDCLIAPI_HPP_INCLUDED__
#include"zmsg.hpp"#include"mdp.h"// Structure of our class
// We access these properties only via class methods
classmdcli {
public:
// ---------------------------------------------------------------------
// Constructor
mdcli (std::string broker, int verbose): m_broker(broker), m_verbose(verbose)
{
s_version_assert (4, 0);
m_context = new zmq::context_t (1);
s_catch_signals ();
connect_to_broker ();
}
// ---------------------------------------------------------------------
// Destructor
virtual
~mdcli ()
{
delete m_client;
delete m_context;
}
// ---------------------------------------------------------------------
// Connect or reconnect to broker
voidconnect_to_broker ()
{
if (m_client) {
delete m_client;
}
m_client = new zmq::socket_t (*m_context, ZMQ_DEALER);
int linger = 0;
m_client->setsockopt (ZMQ_LINGER, &linger, sizeof (linger));
s_set_id(*m_client);
m_client->connect (m_broker.c_str());
if (m_verbose)
s_console ("I: connecting to broker at %s...", m_broker.c_str());
}
// ---------------------------------------------------------------------
// Set request timeout
voidset_timeout (int timeout)
{
m_timeout = timeout;
}
// ---------------------------------------------------------------------
// Send request to broker
// Takes ownership of request message and destroys it when sent.
intsend (std::string service, zmsg *&request_p)
{
assert (request_p);
zmsg *request = request_p;
// Prefix request with protocol frames
// Frame 0: empty (REQ emulation)
// Frame 1: "MDPCxy" (six bytes, MDP/Client x.y)
// Frame 2: Service name (printable string)
request->push_front (service.c_str());
request->push_front (k_mdp_client.data());
request->push_front ("");
if (m_verbose) {
s_console ("I: send request to '%s' service:", service.c_str());
request->dump ();
}
request->send (*m_client);
return0;
}
// ---------------------------------------------------------------------
// Returns the reply message or NULL if there was no reply. Does not
// attempt to recover from a broker failure, this is not possible
// without storing all unanswered requests and resending them all...
zmsg *
recv ()
{
// Poll socket for a reply, with timeout
zmq::pollitem_t items[] = {
{ *m_client, 0, ZMQ_POLLIN, 0 } };
zmq::poll (items, 1, m_timeout);
// If we got a reply, process it
if (items[0].revents & ZMQ_POLLIN) {
zmsg *msg = new zmsg (*m_client);
if (m_verbose) {
s_console ("I: received reply:");
msg->dump ();
}
// Don't try to handle errors, just assert noisily
assert (msg->parts () >= 4);
assert (msg->pop_front ().length() == 0); // empty message
ustring header = msg->pop_front();
assert (header.compare((unsignedchar *)k_mdp_client.data()) == 0);
ustring service = msg->pop_front();
assert (service.compare((unsignedchar *)service.c_str()) == 0);
return msg; // Success
}
if (s_interrupted)
std::cout << "W: interrupt received, killing client..." << std::endl;
elseif (m_verbose)
s_console ("W: permanent error, abandoning request");
return0;
}
private:
const std::string m_broker;
zmq::context_t * m_context;
zmq::socket_t * m_client{}; // Socket to broker
constint m_verbose; // Print activity to stdout
int m_timeout{2500}; // Request timeout
};
#endif
mdcliapi2: Majordomo asynchronous client API in C#
mdcliapi2: Majordomo asynchronous client API in Lua
---- mdcliapi2.lua - Majordomo Protocol Client API (async version)---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--local setmetatable = setmetatable
local mdp = require"mdp"local zmq = require"zmq"local zpoller = require"zmq.poller"local zmsg = require"zmsg"
require"zhelpers"local s_version_assert = s_version_assert
local obj_mt = {}
obj_mt.__index = obj_mt
functionobj_mt:set_timeout(timeout)
self.timeout = timeout
endfunctionobj_mt:destroy()
if self.client then self.client:close() end
self.context:term()
endlocalfunctions_mdcli_connect_to_broker(self)
-- close old socket.if self.client then
self.poller:remove(self.client)
self.client:close()
end
self.client = assert(self.context:socket(zmq.DEALER))
assert(self.client:setopt(zmq.LINGER, 0))
assert(self.client:connect(self.broker))
if self.verbose then
s_console("I: connecting to broker at %s...", self.broker)
end-- add socket to poller
self.poller:add(self.client, zmq.POLLIN, function()
self.got_reply = trueend)
end---- Send request to broker and get reply by hook or crook--functionobj_mt:send(service, request)
-- Prefix request with protocol frames-- Frame 0: empty (REQ emulation)-- Frame 1: "MDPCxy" (six bytes, MDP/Client x.y)-- Frame 2: Service name (printable string)
request:push(service)
request:push(mdp.MDPC_CLIENT)
request:push("")
if self.verbose then
s_console("I: send request to '%s' service:", service)
request:dump()
end
request:send(self.client)
return0end-- Returns the reply message or NULL if there was no reply. Does not-- attempt to recover from a broker failure, this is not possible-- without storing all unanswered requests and resending them all...functionobj_mt:recv()
self.got_reply = falselocal cnt = assert(self.poller:poll(self.timeout * 1000))
if cnt ~= 0and self.got_reply thenlocal msg = zmsg.recv(self.client)
if self.verbose then
s_console("I: received reply:")
msg:dump()
end
assert(msg:parts() >= 3)
local empty = msg:pop()
assert(empty == "")
local header = msg:pop()
assert(header == mdp.MDPC_CLIENT)
return msg
endif self.verbose then
s_console("W: permanent error, abandoning request")
endreturnnil-- Giving upend
module(...)
functionnew(broker, verbose)
s_version_assert (2, 1);
local self = setmetatable({
context = zmq.init(1),
poller = zpoller.new(1),
broker = broker,
verbose = verbose,
timeout = 2500, -- msecs
}, obj_mt)
s_mdcli_connect_to_broker(self)
return self
end
setmetatable(_M, { __call = function(self, ...) return new(...) end })
mdcliapi2: Majordomo asynchronous client API in Node.js
mdcliapi2: Majordomo asynchronous client API in Tcl
# Majordomo Protocol Client API, Tcl version.
# Implements the MDP/Worker spec at http:#rfc.zeromq.org/spec:7.
package require TclOO
package require zmq
package require mdp
package provide MDClient 2.0oo::class create MDClient {variable context broker verbose timeout retries client
constructor{ibroker{iverbose0}}{set context [zmq context mdcli_context_[::mdp::contextid]]set broker $ibrokerset verbose $iverboseset timeout 2500set client ""my connect_to_broker
}destructor{$clientclose$contextterm}method connect_to_broker {}{if{[string length $client]}{$clientclose}set client [zmq socket mdcli_socket_[::mdp::socketid]$context DEALER]$clientconnect$brokerif{$verbose}{puts"I: connecting to broker at $broker..."}}method set_timeout {itimeout}{set timeout $itimeout}# Send request to broker
# Takes ownership of request message and destroys it when sent.
method send {service request}{# Prefix request with protocol frames
# Frame 0: empty (REQ emulation)
# Frame 1: "MDPCxy" (six bytes, MDP/Client x.y)
# Frame 2: Service name (printable string)
set request [zmsg push $request$service]set request [zmsg push $request$mdp::MDPC_CLIENT]set request [zmsg push $request""]if{$verbose}{puts"I: send request to '$service' service:"puts[join[zmsg dump $request]\n]}zmsg send $client$request}# Returns the reply message or NULL if there was no reply. Does not
# attempt to recover from a broker failure, this is not possible
# without storing all unanswered requests and resending them all...
method recv {}{# Poll socket for a reply, with timeout
set poll_set [list[list$client[list POLLIN]]]set rpoll_set [zmq poll $poll_set$timeout]# If we got a reply, process it
if{[llength$rpoll_set] && "POLLIN"in[lindex$rpoll_set01]}{set msg [zmsg recv $client]if{$verbose}{puts"I: received reply:"puts[join[zmsg dump $msg]\n]}# Don't try to handle errors, just assert noisily
if{[llength$msg] < 4}{error"message size < 4"}set empty [zmsg pop msg]if{[string length $empty]}{error"expected empty frame"}set header [zmsg pop msg]if{$headerne$mdp::MDPC_CLIENT}{error"unexpected header"}set service [zmsg pop msg]return$msg;# Success
}if{$verbose}{puts"W: permanent error, abandoning"}return{}}}
mdcliapi2: Majordomo asynchronous client API in OCaml
#!/usr/bin/env ruby# Majordomo Protocol client example. Uses the mdcliapi2 API to hide all MDP aspects## Author : Tom van Leeuwen <tom@vleeuwen.eu># Based on Python example by Min RKrequire'./mdcliapi2.rb'
client = MajorDomoClient.new('tcp://localhost:5555')
requests = 100000
requests.times do |i|
request = 'Hello world'begin
client.send('echo', request)
endend
count = 0while count < requests dobegin
reply = client.recv
end
count += 1endputs"#{count} requests/replies processed"
The broker and worker are unchanged because we’ve not modified the protocol at all. We see an immediate improvement in performance. Here’s the synchronous client chugging through 100K request-reply cycles:
$ time mdclient
100000 requests/replies processed
real 0m14.088s
user 0m1.310s
sys 0m2.670s
And here’s the asynchronous client, with a single worker:
$ time mdclient2
100000 replies received
real 0m8.730s
user 0m0.920s
sys 0m1.550s
Twice as fast. Not bad, but let’s fire up 10 workers and see how it handles the traffic
$ time mdclient2
100000 replies received
real 0m3.863s
user 0m0.730s
sys 0m0.470s
It isn’t fully asynchronous because workers get their messages on a strict last-used basis. But it will scale better with more workers. On my PC, after eight or so workers, it doesn’t get any faster. Four cores only stretches so far. But we got a 4x improvement in throughput with just a few minutes’ work. The broker is still unoptimized. It spends most of its time copying message frames around, instead of doing zero-copy, which it could. But we’re getting 25K reliable request/reply calls a second, with pretty low effort.
However, the asynchronous Majordomo pattern isn’t all roses. It has a fundamental weakness, namely that it cannot survive a broker crash without more work. If you look at the mdcliapi2 code you’ll see it does not attempt to reconnect after a failure. A proper reconnect would require the following:
A number on every request and a matching number on every reply, which would ideally require a change to the protocol to enforce.
Tracking and holding onto all outstanding requests in the client API, i.e., those for which no reply has yet been received.
In case of failover, for the client API to resend all outstanding requests to the broker.
It’s not a deal breaker, but it does show that performance often means complexity. Is this worth doing for Majordomo? It depends on your use case. For a name lookup service you call once per session, no. For a web frontend serving thousands of clients, probably yes.
So, we have a nice service-oriented broker, but we have no way of knowing whether a particular service is available or not. We know whether a request failed, but we don’t know why. It is useful to be able to ask the broker, “is the echo service running?” The most obvious way would be to modify our MDP/Client protocol to add commands to ask this. But MDP/Client has the great charm of being simple. Adding service discovery to it would make it as complex as the MDP/Worker protocol.
Another option is to do what email does, and ask that undeliverable requests be returned. This can work well in an asynchronous world, but it also adds complexity. We need ways to distinguish returned requests from replies and to handle these properly.
Let’s try to use what we’ve already built, building on top of MDP instead of modifying it. Service discovery is, itself, a service. It might indeed be one of several management services, such as “disable service X”, “provide statistics”, and so on. What we want is a general, extensible solution that doesn’t affect the protocol or existing applications.
So here’s a small RFC that layers this on top of MDP:
the Majordomo Management Interface (MMI). We already implemented it in the broker, though unless you read the whole thing you probably missed that. I’ll explain how it works in the broker:
When a client requests a service that starts with mmi., instead of routing this to a worker, we handle it internally.
We handle just one service in this broker, which is mmi.service, the service discovery service.
The payload for the request is the name of an external service (a real one, provided by a worker).
The broker returns “200” (OK) or “404” (Not found), depending on whether there are workers registered for that service or not.
Here’s how we use the service discovery in an application:
// MMI echo query example
// Lets us build this source without creating a library
#include"mdcliapi.c"intmain (int argc, char *argv [])
{
int verbose = (argc > 1 && streq (argv [1], "-v"));
mdcli_t *session = mdcli_new ("tcp://localhost:5555", verbose);
// This is the service we want to look up
zmsg_t *request = zmsg_new ();
zmsg_addstr (request, "echo");
// This is the service we send our request to
zmsg_t *reply = mdcli_send (session, "mmi.service", &request);
if (reply) {
char *reply_code = zframe_strdup (zmsg_first (reply));
printf ("Lookup echo service: %s\n", reply_code);
free (reply_code);
zmsg_destroy (&reply);
}
else
printf ("E: no response from broker, make sure it's running\n");
mdcli_destroy (&session);
return0;
}
package ;
importneko.Lib;
importneko.Sys;
importorg.zeromq.ZMsg;
/**
* MMI echo query example
*/class MMIEcho
{
publicstaticfunctionmain() {
Lib.println("** MMIEcho (see: http://zguide.zeromq.org/page:all#Service-Discovery)");
var argArr = Sys.args();
var verbose = (argArr.length > 1 && argArr[argArr.length - 1] == "-v");
var session = new MDCliAPI("tcp://localhost:5555", verbose);
// This is the service we want to look upvar request = new ZMsg();
request.addString("echo");
// This is the service we send our request tovar reply = session.send("mmi.service", request);
if (reply != null) {
var replyCode = reply.first().toString();
Lib.println("Lookup echo service: " + replyCode);
} else
Lib.println("E: no response from broker, make sure it's running");
session.destroy();
}
}
mmiecho: Service discovery over Majordomo in Java
packageguide;
importorg.zeromq.ZMsg;
/**
* MMI echo query example
*/publicclassmmiecho
{
publicstaticvoidmain(String[] args)
{
boolean verbose = (args.length > 0 && "-v".equals(args[0]));
mdcliapi clientSession = new mdcliapi("tcp://localhost:5555", verbose);
ZMsg request = new ZMsg();
// This is the service we want to look up
request.addString("echo");
// This is the service we send our request to
ZMsg reply = clientSession.send("mmi.service", request);
if (reply != null) {
String replyCode = reply.getFirst().toString();
System.out.printf("Lookup echo service: %s\n", replyCode);
}
else {
System.out.println("E: no response from broker, make sure it's running");
}
clientSession.destroy();
}
}
mmiecho: Service discovery over Majordomo in Julia
---- MMI echo query example---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"mdcliapi"
require"zmsg"
require"zhelpers"local verbose = (arg[1] == "-v")
local session = mdcliapi.new("tcp://localhost:5555", verbose)
-- This is the service we want to look uplocal request = zmsg.new("echo")
-- This is the service we send our request tolocal reply = session:send("mmi.service", request)
if (reply) then
printf ("Lookup echo service: %s\n", reply:body())
else
printf ("E: no response from broker, make sure it's running\n")
end
session:destroy()
mmiecho: Service discovery over Majordomo in Node.js
<?php/*
* MMI echo query example
*
* @author Ian Barber <ian(dot)barber(at)gmail(dot)com>
*/include'mdcliapi.php';
$verbose = $_SERVER['argc'] > 1 && $_SERVER['argv'][1] == '-v';
$session = new MDCli("tcp://localhost:5555", $verbose);
// This is the service we want to look up
$request = new Zmsg();
$request->body_set("echo");
// This is the service we send our request to
$reply = $session->send("mmi.service", $request);
if ($reply) {
$reply_code = $reply->pop();
printf ("Lookup echo service: %s %s", $reply_code, PHP_EOL);
}
mmiecho: Service discovery over Majordomo in Python
"""
MMI echo query example
Author : Min RK <benjaminrk@gmail.com>
"""importsysfrommdcliapiimport MajorDomoClient
defmain():
verbose = '-v'in sys.argv
client = MajorDomoClient("tcp://localhost:5555", verbose)
request = b"echo"
reply = client.send(b"mmi.service", request)
if reply:
replycode = reply[0]
print ("Lookup echo service:", replycode)
else:
print ("E: no response from broker, make sure it's running")
if __name__ == '__main__':
main()
Try this with and without a worker running, and you should see the little program report “200” or “404” accordingly. The implementation of MMI in our example broker is flimsy. For example, if a worker disappears, services remain “present”. In practice, a broker should remove services that have no workers after some configurable timeout.
Idempotency is not something you take a pill for. What it means is that it’s safe to repeat an operation. Checking the clock is idempotent. Lending ones credit card to ones children is not. While many client-to-server use cases are idempotent, some are not. Examples of idempotent use cases include:
Stateless task distribution, i.e., a pipeline where the servers are stateless workers that compute a reply based purely on the state provided by a request. In such a case, it’s safe (though inefficient) to execute the same request many times.
A name service that translates logical addresses into endpoints to bind or connect to. In such a case, it’s safe to make the same lookup request many times.
And here are examples of a non-idempotent use cases:
A logging service. One does not want the same log information recorded more than once.
Any service that has impact on downstream nodes, e.g., sends on information to other nodes. If that service gets the same request more than once, downstream nodes will get duplicate information.
Any service that modifies shared data in some non-idempotent way; e.g., a service that debits a bank account is not idempotent without extra work.
When our server applications are not idempotent, we have to think more carefully about when exactly they might crash. If an application dies when it’s idle, or while it’s processing a request, that’s usually fine. We can use database transactions to make sure a debit and a credit are always done together, if at all. If the server dies while sending its reply, that’s a problem, because as far as it’s concerned, it has done its work.
If the network dies just as the reply is making its way back to the client, the same problem arises. The client will think the server died and will resend the request, and the server will do the same work twice, which is not what we want.
To handle non-idempotent operations, use the fairly standard solution of detecting and rejecting duplicate requests. This means:
The client must stamp every request with a unique client identifier and a unique message number.
The server, before sending back a reply, stores it using the combination of client ID and message number as a key.
The server, when getting a request from a given client, first checks whether it has a reply for that client ID and message number. If so, it does not process the request, but just resends the reply.
Once you realize that Majordomo is a “reliable” message broker, you might be tempted to add some spinning rust (that is, ferrous-based hard disk platters). After all, this works for all the enterprise messaging systems. It’s such a tempting idea that it’s a little sad to have to be negative toward it. But brutal cynicism is one of my specialties. So, some reasons you don’t want rust-based brokers sitting in the center of your architecture are:
As you’ve seen, the Lazy Pirate client performs surprisingly well. It works across a whole range of architectures, from direct client-to-server to distributed queue proxies. It does tend to assume that workers are stateless and idempotent. But we can work around that limitation without resorting to rust.
Rust brings a whole set of problems, from slow performance to additional pieces that you have to manage, repair, and handle 6 a.m. panics from, as they inevitably break at the start of daily operations. The beauty of the Pirate patterns in general is their simplicity. They won’t crash. And if you’re still worried about the hardware, you can move to a peer-to-peer pattern that has no broker at all. I’ll explain later in this chapter.
Having said this, however, there is one sane use case for rust-based reliability, which is an asynchronous disconnected network. It solves a major problem with Pirate, namely that a client has to wait for an answer in real time. If clients and workers are only sporadically connected (think of email as an analogy), we can’t use a stateless network between clients and workers. We have to put state in the middle.
So, here’s the Titanic pattern, in which we write messages to disk to ensure they never get lost, no matter how sporadically clients and workers are connected. As we did for service discovery, we’re going to layer Titanic on top of MDP rather than extend it. It’s wonderfully lazy because it means we can implement our fire-and-forget reliability in a specialized worker, rather than in the broker. This is excellent for several reasons:
It is much easier because we divide and conquer: the broker handles message routing and the worker handles reliability.
It lets us mix brokers written in one language with workers written in another.
It lets us evolve the fire-and-forget technology independently.
The only downside is that there’s an extra network hop between broker and hard disk. The benefits are easily worth it.
There are many ways to make a persistent request-reply architecture. We’ll aim for one that is simple and painless. The simplest design I could come up with, after playing with this for a few hours, is a “proxy service”. That is, Titanic doesn’t affect workers at all. If a client wants a reply immediately, it talks directly to a service and hopes the service is available. If a client is happy to wait a while, it talks to Titanic instead and asks, “hey, buddy, would you take care of this for me while I go buy my groceries?”
Figure 51 - The Titanic Pattern
Titanic is thus both a worker and a client. The dialog between client and Titanic goes along these lines:
Client: Please accept this request for me. Titanic: OK, done.
Client: Do you have a reply for me? Titanic: Yes, here it is. Or, no, not yet.
Client: OK, you can wipe that request now, I’m happy. Titanic: OK, done.
Whereas the dialog between Titanic and broker and worker goes like this:
Titanic: Hey, Broker, is there an coffee service? Broker: Uhm, Yeah, seems like.
Titanic: Hey, coffee service, please handle this for me.
Coffee: Sure, here you are.
Titanic: Sweeeeet!
You can work through this and the possible failure scenarios. If a worker crashes while processing a request, Titanic retries indefinitely. If a reply gets lost somewhere, Titanic will retry. If the request gets processed but the client doesn’t get the reply, it will ask again. If Titanic crashes while processing a request or a reply, the client will try again. As long as requests are fully committed to safe storage, work can’t get lost.
The handshaking is pedantic, but can be pipelined, i.e., clients can use the asynchronous Majordomo pattern to do a lot of work and then get the responses later.
We need some way for a client to request its replies. We’ll have many clients asking for the same services, and clients disappear and reappear with different identities. Here is a simple, reasonably secure solution:
Every request generates a universally unique ID (UUID), which Titanic returns to the client after it has queued the request.
When a client asks for a reply, it must specify the UUID for the original request.
In a realistic case, the client would want to store its request UUIDs safely, e.g., in a local database.
Before we jump off and write yet another formal specification (fun, fun!), let’s consider how the client talks to Titanic. One way is to use a single service and send it three different request types. Another way, which seems simpler, is to use three services:
titanic.request: store a request message, and return a UUID for the request.
titanic.reply: fetch a reply, if available, for a given request UUID.
titanic.close: confirm that a reply has been stored and processed.
We’ll just make a multithreaded worker, which as we’ve seen from our multithreading experience with ZeroMQ, is trivial. However, let’s first sketch what Titanic would look like in terms of ZeroMQ messages and frames. This gives us the
Titanic Service Protocol (TSP).
Using TSP is clearly more work for client applications than accessing a service directly via MDP. Here’s the shortest robust “echo” client example:
Of course this can be, and should be, wrapped up in some kind of framework or API. It’s not healthy to ask average application developers to learn the full details of messaging: it hurts their brains, costs time, and offers too many ways to make buggy complexity. Additionally, it makes it hard to add intelligence.
For example, this client blocks on each request whereas in a real application, we’d want to be doing useful work while tasks are executed. This requires some nontrivial plumbing to build a background thread and talk to that cleanly. It’s the kind of thing you want to wrap in a nice simple API that the average developer cannot misuse. It’s the same approach that we used for Majordomo.
Here’s the Titanic implementation. This server handles the three services using three threads, as proposed. It does full persistence to disk using the most brutal approach possible: one file per message. It’s so simple, it’s scary. The only complex part is that it keeps a separate queue of all requests, to avoid reading the directory over and over:
// Titanic service
// Implements server side of http://rfc.zeromq.org/spec:9
// Lets us build this source without creating a library
#include"mdwrkapi.c"#include"mdcliapi.c"#include"zfile.h"#include<uuid/uuid.h>// Return a new UUID as a printable character string
// Caller must free returned string when finished with it
staticchar *
s_generate_uuid (void)
{
char hex_char [] = "0123456789ABCDEF";
char *uuidstr = zmalloc (sizeof (uuid_t) * 2 + 1);
uuid_t uuid;
uuid_generate (uuid);
int byte_nbr;
for (byte_nbr = 0; byte_nbr < sizeof (uuid_t); byte_nbr++) {
uuidstr [byte_nbr * 2 + 0] = hex_char [uuid [byte_nbr] >> 4];
uuidstr [byte_nbr * 2 + 1] = hex_char [uuid [byte_nbr] & 15];
}
return uuidstr;
}
// Returns freshly allocated request filename for given UUID
#define TITANIC_DIR ".titanic"
staticchar *
s_request_filename (char *uuid) {
char *filename = malloc (256);
snprintf (filename, 256, TITANIC_DIR "/%s.req", uuid);
return filename;
}
// Returns freshly allocated reply filename for given UUID
staticchar *
s_reply_filename (char *uuid) {
char *filename = malloc (256);
snprintf (filename, 256, TITANIC_DIR "/%s.rep", uuid);
return filename;
}
// .split Titanic request service
// The {{titanic.request}} task waits for requests to this service. It writes
// each request to disk and returns a UUID to the client. The client picks
// up the reply asynchronously using the {{titanic.reply}} service:
staticvoidtitanic_request (void *args, zctx_t *ctx, void *pipe)
{
mdwrk_t *worker = mdwrk_new (
"tcp://localhost:5555", "titanic.request", 0);
zmsg_t *reply = NULL;
while (true) {
// Send reply if it's not null
// And then get next request from broker
zmsg_t *request = mdwrk_recv (worker, &reply);
if (!request)
break; // Interrupted, exit
// Ensure message directory exists
zfile_mkdir (TITANIC_DIR);
// Generate UUID and save message to disk
char *uuid = s_generate_uuid ();
char *filename = s_request_filename (uuid);
FILE *file = fopen (filename, "w");
assert (file);
zmsg_save (request, file);
fclose (file);
free (filename);
zmsg_destroy (&request);
// Send UUID through to message queue
reply = zmsg_new ();
zmsg_addstr (reply, uuid);
zmsg_send (&reply, pipe);
// Now send UUID back to client
// Done by the mdwrk_recv() at the top of the loop
reply = zmsg_new ();
zmsg_addstr (reply, "200");
zmsg_addstr (reply, uuid);
free (uuid);
}
mdwrk_destroy (&worker);
}
// .split Titanic reply service
// The {{titanic.reply}} task checks if there's a reply for the specified
// request (by UUID), and returns a 200 (OK), 300 (Pending), or 400
// (Unknown) accordingly:
staticvoid *
titanic_reply (void *context)
{
mdwrk_t *worker = mdwrk_new (
"tcp://localhost:5555", "titanic.reply", 0);
zmsg_t *reply = NULL;
while (true) {
zmsg_t *request = mdwrk_recv (worker, &reply);
if (!request)
break; // Interrupted, exit
char *uuid = zmsg_popstr (request);
char *req_filename = s_request_filename (uuid);
char *rep_filename = s_reply_filename (uuid);
if (zfile_exists (rep_filename)) {
FILE *file = fopen (rep_filename, "r");
assert (file);
reply = zmsg_load (NULL, file);
zmsg_pushstr (reply, "200");
fclose (file);
}
else {
reply = zmsg_new ();
if (zfile_exists (req_filename))
zmsg_pushstr (reply, "300"); //Pending
else
zmsg_pushstr (reply, "400"); //Unknown
}
zmsg_destroy (&request);
free (uuid);
free (req_filename);
free (rep_filename);
}
mdwrk_destroy (&worker);
return0;
}
// .split Titanic close task
// The {{titanic.close}} task removes any waiting replies for the request
// (specified by UUID). It's idempotent, so it is safe to call more than
// once in a row:
staticvoid *
titanic_close (void *context)
{
mdwrk_t *worker = mdwrk_new (
"tcp://localhost:5555", "titanic.close", 0);
zmsg_t *reply = NULL;
while (true) {
zmsg_t *request = mdwrk_recv (worker, &reply);
if (!request)
break; // Interrupted, exit
char *uuid = zmsg_popstr (request);
char *req_filename = s_request_filename (uuid);
char *rep_filename = s_reply_filename (uuid);
zfile_delete (req_filename);
zfile_delete (rep_filename);
free (uuid);
free (req_filename);
free (rep_filename);
zmsg_destroy (&request);
reply = zmsg_new ();
zmsg_addstr (reply, "200");
}
mdwrk_destroy (&worker);
return0;
}
// .split worker task
// This is the main thread for the Titanic worker. It starts three child
// threads; for the request, reply, and close services. It then dispatches
// requests to workers using a simple brute force disk queue. It receives
// request UUIDs from the {{titanic.request}} service, saves these to a disk
// file, and then throws each request at MDP workers until it gets a
// response.
staticints_service_success (char *uuid);
intmain (int argc, char *argv [])
{
int verbose = (argc > 1 && streq (argv [1], "-v"));
zctx_t *ctx = zctx_new ();
void *request_pipe = zthread_fork (ctx, titanic_request, NULL);
zthread_new (titanic_reply, NULL);
zthread_new (titanic_close, NULL);
// Main dispatcher loop
while (true) {
// We'll dispatch once per second, if there's no activity
zmq_pollitem_t items [] = { { request_pipe, 0, ZMQ_POLLIN, 0 } };
int rc = zmq_poll (items, 1, 1000 * ZMQ_POLL_MSEC);
if (rc == -1)
break; // Interrupted
if (items [0].revents & ZMQ_POLLIN) {
// Ensure message directory exists
zfile_mkdir (TITANIC_DIR);
// Append UUID to queue, prefixed with '-' for pending
zmsg_t *msg = zmsg_recv (request_pipe);
if (!msg)
break; // Interrupted
FILE *file = fopen (TITANIC_DIR "/queue", "a");
char *uuid = zmsg_popstr (msg);
fprintf (file, "-%s\n", uuid);
fclose (file);
free (uuid);
zmsg_destroy (&msg);
}
// Brute force dispatcher
char entry [] = "?.......:.......:.......:.......:";
FILE *file = fopen (TITANIC_DIR "/queue", "r+");
while (file && fread (entry, 33, 1, file) == 1) {
// UUID is prefixed with '-' if still waiting
if (entry [0] == '-') {
if (verbose)
printf ("I: processing request %s\n", entry + 1);
if (s_service_success (entry + 1)) {
// Mark queue entry as processed
fseek (file, -33, SEEK_CUR);
fwrite ("+", 1, 1, file);
fseek (file, 32, SEEK_CUR);
}
}
// Skip end of line, LF or CRLF
if (fgetc (file) == '\r')
fgetc (file);
if (zctx_interrupted)
break;
}
if (file)
fclose (file);
}
return0;
}
// .split try to call a service
// Here, we first check if the requested MDP service is defined or not,
// using a MMI lookup to the Majordomo broker. If the service exists,
// we send a request and wait for a reply using the conventional MDP
// client API. This is not meant to be fast, just very simple:
staticints_service_success (char *uuid)
{
// Load request message, service will be first frame
char *filename = s_request_filename (uuid);
FILE *file = fopen (filename, "r");
free (filename);
// If the client already closed request, treat as successful
if (!file)
return1;
zmsg_t *request = zmsg_load (NULL, file);
fclose (file);
zframe_t *service = zmsg_pop (request);
char *service_name = zframe_strdup (service);
// Create MDP client session with short timeout
mdcli_t *client = mdcli_new ("tcp://localhost:5555", false);
mdcli_set_timeout (client, 1000); // 1 sec
mdcli_set_retries (client, 1); // only 1 retry
// Use MMI protocol to check if service is available
zmsg_t *mmi_request = zmsg_new ();
zmsg_add (mmi_request, service);
zmsg_t *mmi_reply = mdcli_send (client, "mmi.service", &mmi_request);
int service_ok = (mmi_reply
&& zframe_streq (zmsg_first (mmi_reply), "200"));
zmsg_destroy (&mmi_reply);
int result = 0;
if (service_ok) {
zmsg_t *reply = mdcli_send (client, service_name, &request);
if (reply) {
filename = s_reply_filename (uuid);
FILE *file = fopen (filename, "w");
assert (file);
zmsg_save (reply, file);
fclose (file);
free (filename);
result = 1;
}
zmsg_destroy (&reply);
}
else
zmsg_destroy (&request);
mdcli_destroy (&client);
free (service_name);
return result;
}
titanic: Titanic broker example in C++
#include<iostream>#include<random>#include<sstream>#include<iomanip>#include<thread>#include<filesystem>#include<fstream>#include"mdcliapi.hpp"#include"mdwrkapi.hpp"#define ZMQ_POLL_MSEC 1
#define BROKER_ENDPOINT "tcp://localhost:5555"
std::string generateUUID() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 15);
std::uniform_int_distribution<> dis2(8, 11);
std::stringstream ss;
ss << std::hex;
for (int i = 0; i < 8; ++i) ss << dis(gen);
// ss << "-";
for (int i = 0; i < 4; ++i) ss << dis(gen);
ss << "4"; // UUID version 4
for (int i = 0; i < 3; ++i) ss << dis(gen);
// ss << "-";
ss << dis2(gen); // UUID variant
for (int i = 0; i < 3; ++i) ss << dis(gen);
// ss << "-";
for (int i = 0; i < 12; ++i) ss << dis(gen);
return ss.str();
}
// Returns freshly allocated request filename for given UUID
#define TITANIC_DIR ".titanic"
static std::string s_request_filename(std::string uuid) {
return std::string(TITANIC_DIR) + "/" + uuid + ".req";
}
// Returns freshly allocated reply filename for given UUID
static std::string s_reply_filename(std::string uuid) {
return std::string(TITANIC_DIR) + "/" + uuid + ".rep";
}
staticbools_zmsg_save(zmsg *msg, const std::filesystem::path &filepath) {
std::ofstream ofs(filepath);
if (ofs) {
while (msg->parts() > 0)
{
ofs << msg->pop_front().c_str() << std::endl;
}
std::cout << "File created and data written: " << filepath << std::endl;
returntrue;
}
else {
std::cerr << "Failed to create or write to file: " << filepath << std::endl;
returnfalse;
}
}
static zmsg* s_zmsg_load(const std::filesystem::path &filepath) {
zmsg *msg = new zmsg();
std::ifstream ifs(filepath);
if (ifs) {
std::string line;
while (std::getline(ifs, line)) {
msg->push_back(line.c_str());
}
ifs.close();
return msg;
}
else {
std::cerr << "Failed to read file: " << filepath << std::endl;
returnnullptr;
}
}
// .split Titanic request service
// The {{titanic.request}} task waits for requests to this service. It writes
// each request to disk and returns a UUID to the client. The client picks
// up the reply asynchronously using the {{titanic.reply}} service:
staticvoidtitanic_request(zmq::context_t *ctx) {
mdwrk *worker = new mdwrk(BROKER_ENDPOINT, "titanic.request", 0);
worker->set_heartbeat(3000);
zmsg *reply = nullptr;
// communicate with parent thread
zmq::socket_t pipe(*ctx, ZMQ_PAIR);
pipe.bind("inproc://titanic_request");
while (true) {
// Send reply if it's not null
// And then get next request from broker
zmsg *request = worker->recv(reply);
std::cout << "titanic_request: received request" << std::endl;
request->dump();
if (!request) {
break; // Interrupted, exit
}
// Ensure message directory exists
std::filesystem::path titanic_dir(TITANIC_DIR);
std::filesystem::create_directory(titanic_dir);
// std::cout << "I: creating " << TITANIC_DIR << " directory" << std::endl;
// Generate UUID and save message to disk
std::string uuid = generateUUID();
std::filesystem::path request_file(s_request_filename(uuid));
if (!s_zmsg_save(request, request_file)) {
break; // dump file failed, exit
}
delete request;
// Send UUID through to message queue
reply = new zmsg(uuid.c_str());
reply->send(pipe);
std::cout << "titanic_request: sent reply to parent" << std::endl;
// Now send UUID back to client
// Done by the mdwrk_recv() at the top of the loop
reply = new zmsg("200");
reply->push_back(uuid.c_str());
}
delete worker;
return;
}
// .split Titanic reply service
// The {{titanic.reply}} task checks if there's a reply for the specified
// request (by UUID), and returns a 200 (OK), 300 (Pending), or 400
// (Unknown) accordingly:
staticvoidtitanic_reply(zmq::context_t *ctx) {
mdwrk *worker = new mdwrk(BROKER_ENDPOINT, "titanic.reply", 0);
worker->set_heartbeat(3000);
zmsg *reply = nullptr;
while(true) {
zmsg *request = worker->recv(reply);
if (!request) {
break; // Interrupted, exit
}
std::string uuid = std::string((char *)request->pop_front().c_str());
std::filesystem::path request_filename(s_request_filename(uuid));
std::filesystem::path reply_filename(s_reply_filename(uuid));
// Try to read the reply file
if (std::filesystem::exists(reply_filename)) {
reply = s_zmsg_load(reply_filename);
assert(reply);
reply->push_front("200");
}
else {
reply = new zmsg();
if (std::filesystem::exists(request_filename)) {
reply->push_front("300"); //Pending
}
else {
reply->push_front("400"); //Unknown
}
}
delete request;
}
delete worker;
}
// .split Titanic close task
// The {{titanic.close}} task removes any waiting replies for the request
// (specified by UUID). It's idempotent, so it is safe to call more than
// once in a row:
staticvoidtitanic_close(zmq::context_t *ctx) {
mdwrk *worker = new mdwrk(BROKER_ENDPOINT, "titanic.close", 0);
worker->set_heartbeat(3000);
zmsg *reply = nullptr;
while (true) {
zmsg *request = worker->recv(reply);
if (!request) {
break; // Interrupted, exit
}
std::string uuid = std::string((char *)request->pop_front().c_str());
std::filesystem::path request_filename(s_request_filename(uuid));
std::filesystem::path reply_filename(s_reply_filename(uuid));
std::filesystem::remove(request_filename);
std::filesystem::remove(reply_filename);
delete request;
reply = new zmsg("200");
}
delete worker;
return;
}
// .split worker task
// This is the main thread for the Titanic worker. It starts three child
// threads; for the request, reply, and close services. It then dispatches
// requests to workers using a simple brute force disk queue. It receives
// request UUIDs from the {{titanic.request}} service, saves these to a disk
// file, and then throws each request at MDP workers until it gets a
// response.
staticbools_service_success(std::string uuid);
// simulate zthread_fork, create attached thread and return the pipe socket
std::pair<std::thread, zmq::socket_t> zthread_fork(zmq::context_t& ctx, void (*thread_func)(zmq::context_t*)) {
// create the pipe socket for the main thread to communicate with its child thread
zmq::socket_t pipe(ctx, ZMQ_PAIR);
pipe.connect("inproc://titanic_request");
// start child thread
std::thread t(thread_func, &ctx);
return std::make_pair(std::move(t), std::move(pipe));
}
intmain(int argc, char *argv[]) {
// std::string uuid = generateUUID();
// std::cout << "Generated UUID: " << uuid << std::endl;
// return 0;
int verbose = (argc > 1 && strcmp(argv[1], "-v") == 0);
zmq::context_t ctx(1);
// start the child threads
auto [titanic_request_thread, request_pipe] = zthread_fork(ctx, titanic_request);
std::thread titanic_reply_thread(titanic_reply, &ctx);
titanic_reply_thread.detach();
std::thread titanic_close_thread(titanic_close, &ctx);
titanic_close_thread.detach();
if (verbose) {
std::cout << "I: all service threads started(request, reply, close)" << std::endl;
}
// Main dispatcher loop
while (true) {
// We'll dispatch once per second, if there's no activity
zmq::pollitem_t items[] = {
{request_pipe, 0, ZMQ_POLLIN, 0}
};
try {
zmq::poll(items, 1, 1000 * ZMQ_POLL_MSEC);
} catch(...) {
break; // Interrupted
}
std::filesystem::path titanic_dir(TITANIC_DIR);
if (items[0].revents & ZMQ_POLLIN) {
// Ensure message directory exists
std::cout << "I: creating " << TITANIC_DIR << " directory" << std::endl;
std::filesystem::create_directory(titanic_dir);
// Append UUID to queue, prefixed with '-' for pending
zmsg *msg = new zmsg(request_pipe);
if (!msg) {
break; // Interrupted
}
std::ofstream ofs(titanic_dir / "queue", std::ios::app); // create if not exist, append
std::string uuid = std::string((char *)msg->pop_front().c_str());
ofs << "-" << uuid << std::endl;
delete msg;
}
// Brute force dispatcher
// std::array<char, 33> entry; // "?.......:.......:.......:.......:"
std::string line;
bool need_commit = false;
std::vector<std::string> new_lines;
std::ifstream file(titanic_dir / "queue");
if (!file.is_open()) {
if (verbose) {
std::cout << "I: queue file not open" << std::endl;
}
continue;
}
if (verbose) {
std::cout << "I: read from queue file" << std::endl;
}
while (std::getline(file, line)) {
if (line[0] == '-') {
std::string uuid = line.substr(1, 32);
if (verbose) {
std::cout << "I: processing request " << uuid << std::endl;
}
if (s_service_success(uuid)) {
line[0] = '+'; // Mark completed
need_commit = true;
}
}
new_lines.push_back(line);
}
file.close();
// Commit update
if (need_commit) {
std::ofstream outfile(titanic_dir / "queue");
if (!outfile.is_open()) {
std::cerr << "I: unable to open queue file" << std::endl;
return1;
}
for (constauto &line : new_lines) {
outfile << line << std::endl;
}
outfile.close();
}
}
return0;
}
// .split try to call a service
// Here, we first check if the requested MDP service is defined or not,
// using a MMI lookup to the Majordomo broker. If the service exists,
// we send a request and wait for a reply using the conventional MDP
// client API. This is not meant to be fast, just very simple:
staticbools_service_success(std::string uuid) {
// Load request message, service will be first frame
std::filesystem::path request_filepath(s_request_filename(uuid));
std::ifstream ifs(request_filepath);
// If the client already closed request, treat as successful
if (!ifs) {
return1;
}
zmsg *request = s_zmsg_load(request_filepath);
char* service_name = (char *)request->pop_front().c_str();
// Create MDP client session with short timeout
mdcli client(BROKER_ENDPOINT, 1);
client.set_timeout(1000); // 1 sec
client.set_retries(1);
// Use MMI protocol to check if service is available
zmsg *mmi_request = new zmsg();
mmi_request->push_back(service_name);
zmsg *mmi_reply = client.send("mmi.service", mmi_request);
bool service_ok = (mmi_reply && strcmp(mmi_reply->address(), "200")==0);
delete mmi_reply;
bool result = false;
if (service_ok) {
zmsg *reply = client.send(service_name, request);
if (reply) {
std::filesystem::path reply_filepath(s_reply_filename(uuid));
s_zmsg_save(reply, reply_filepath);
result = true;
}
delete reply;
} else {
std::cout << "service not available: " << service_name << std::endl;
delete request;
}
return result;
}
package ;
importhaxe.Stack;
importneko.Lib;
importneko.Sys;
importhaxe.io.Input;
importneko.FileSystem;
importneko.io.File;
importneko.io.FileInput;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQPoller;
importorg.zeromq.ZMQSocket;
importorg.zeromq.ZMsg;
importorg.zeromq.ZThread;
importorg.zeromq.ZMQException;
/**
* Titanic service
* Implements server side of http://rfc.zeromq.org/spec:9
* @author Richard Smith
*/class Titanic
{
/** Connection string to broker */privatevar broker:String;
/** Print activity to stdout */privatevar verbose:Bool;
/** Logger function used in verbose mode */privatevar log:Dynamic->Void;
privatestaticinlinevar UID = "0123456789ABCDEF";
privatestaticinlinevar TITANIC_DIR = ".titanic";
/**
* Main method
*/publicstaticfunctionmain() {
Lib.println("** Titanic (see: http://zguide.zeromq.org/page:all#Disconnected-Reliability-Titanic-Pattern)");
var argArr = Sys.args();
var verbose = (argArr.length > 1 && argArr[argArr.length - 1] == "-v");
var log = Lib.println;
var ctx = new ZContext();
// Create Titanic worker classvar titanic = new Titanic("tcp://localhost:5555", verbose);
// Create MDP client session with short timeoutvar client = new MDCliAPI("tcp://localhost:5555", verbose);
client.timeout = 1000; // 1 sec
client.retries = 1; // Only 1 retryvar requestPipe = ZThread.attach(ctx, titanic.titanicRequest,"tcp://localhost:5555");
ZThread.detach(titanic.titanicReply, "tcp://localhost:5555");
ZThread.detach(titanic.titanicClose, "tcp://localhost:5555");
var poller = new ZMQPoller();
poller.registerSocket(requestPipe, ZMQ.ZMQ_POLLIN());
// Main dispatcher loopwhile (true) {
// We'll dispatch once per second, if there's no activitytry {
var res = poller.poll(1000 * 1000); // 1 sec
} catch (e:ZMQException) {
if (!ZMQ.isInterrupted()) {
trace("ZMQException #:" + e.errNo + ", str:" + e.str());
trace (Stack.toString(Stack.exceptionStack()));
} else
log("W: interrupt received, sinking the titanic...");
ctx.destroy();
client.destroy();
return;
}
if (poller.pollin(1)) {
// Ensure message directory existsif (!FileSystem.exists(TITANIC_DIR))
FileSystem.createDirectory(TITANIC_DIR);
// Append UUID to queue, prefixed with "-" for pendingvar msg = ZMsg.recvMsg(requestPipe);
if (msg == null)
break; // Interruptedvar file = File.append(TITANIC_DIR + "/queue", false);
var uuid = msg.pop().toString();
file.writeString("-" + uuid);
file.flush();
file.close();
}
// Brute-force dispatcherif (FileSystem.exists(TITANIC_DIR + "/queue")) {
try {
var filec = File.getContent(TITANIC_DIR + "/queue");
FileSystem.deleteFile(TITANIC_DIR + "/queue");
var fileh = File.write(TITANIC_DIR + "/queue", false);
var index = 0;
while (index+33 <= filec.length) {
var str = filec.substr(index, 33);
var prefix = "-";
// UUID is prefixed with '-' if still waitingif (str.charAt(0) == "-") {
if (verbose)
log("I: processing request " + str.substr(1));
if (titanic.serviceSuccess(client, str.substr(1))) {
// Mark queue entry as processed
prefix = "+";
}
}
fileh.writeString(prefix + str.substr(1));
index += 33;
}
fileh.flush();
fileh.close();
} catch (e:Dynamic) {
log("E: error reading queue file " +e);
}
}
}
client.destroy();
ctx.destroy();
}
/**
* Constructor
* @param broker
* @param ?verbose
* @param ?logger
*/publicfunctionnew(broker:String, ?verbose:Bool, ?logger:Dynamic->Void) {
this.broker = broker;
this.verbose = verbose;
if (logger != null)
log = logger;
else
log = neko.Lib.println;
}
/**
* Returns a new UUID as a printable String
* @param ?size
* @return
*/privatefunctiongenerateUUID(?size:Int):String {
if (size == null) size = 32;
var nchars = UID.length;
var uid = new StringBuf();
for (i in0 ... size) {
uid.add(UID.charAt(ZHelpers.randof(nchars-1)));
}
return uid.toString();
}
/**
* Returns request filename for given UUID
* @param uuid
* @return
*/privatefunctionrequestFilename(uuid:String):String {
return TITANIC_DIR + "/" + uuid + ".req";
}
/**
* Returns reply filename for given UUID
* @param uuid
* @return
*/privatefunctionreplyFilename(uuid:String):String {
return TITANIC_DIR + "/" + uuid + ".rep";
}
/**
* Implements Titanic request service "titanic.request"
* @param ctx
* @param pipe
*/publicfunctiontitanicRequest(ctx:ZContext, pipe:ZMQSocket, broker:String) {
var worker = new MDWrkAPI(broker, "titanic.request", verbose);
var reply:ZMsg = null;
while (true) {
if (reply != null) trace("reply object:" + reply.toString());
// Send reply if it's not null// and then get next request from brokervar request = worker.recv(reply);
if (request == null)
break; // Interrupted, exit// Ensure message directory existsif (!FileSystem.exists(TITANIC_DIR))
FileSystem.createDirectory(TITANIC_DIR);
// Generate UUID and save message to diskvar uuid = generateUUID();
var filename = requestFilename(uuid);
var file = File.write(filename, false);
ZMsg.save(request, file);
file.close();
request.destroy();
// Send UUID through to message queue
reply = new ZMsg();
reply.addString(uuid);
reply.send(pipe);
// Now send UUID back to client// Done by the worker.recv() call at the top of the loop
reply = new ZMsg();
reply.addString("200");
reply.addString(uuid);
}
worker.destroy();
}
/**
* Implements titanic reply service "titanic.reply"
*/publicfunctiontitanicReply(broker:String) {
var worker = new MDWrkAPI(broker, "titanic.reply", verbose);
var reply:ZMsg = null;
while (true) {
// Send reply if it's not null// and then get next request from brokervar request = worker.recv(reply);
if (request == null)
break; // Interrupted, exit// Ensure message directory existsif (!FileSystem.exists(TITANIC_DIR))
FileSystem.createDirectory(TITANIC_DIR);
// Generate UUID and save message to diskvar uuid = request.popString();
var reqfilename = requestFilename(uuid);
var repfilename = replyFilename(uuid);
if (FileSystem.exists(repfilename)) {
var file = File.read(repfilename, false);
reply = ZMsg.load(file);
reply.pushString("200");
file.close();
} else {
reply = new ZMsg();
if (FileSystem.exists(reqfilename))
reply.pushString("300"); // Pendingelse
reply.pushString("400");
request.destroy();
}
}
worker.destroy();
}
/**
* Implements titanic close service "titanic.close"
* @param broker
*/publicfunctiontitanicClose(broker:String) {
var worker = new MDWrkAPI(broker, "titanic.close", verbose);
var reply:ZMsg = null;
while (true) {
// Send reply if it's not null// and then get next request from brokervar request = worker.recv(reply);
if (request == null)
break; // Interrupted, exit// Ensure message directory existsif (!FileSystem.exists(TITANIC_DIR))
FileSystem.createDirectory(TITANIC_DIR);
// Generate UUID and save message to diskvar uuid = request.popString();
var reqfilename = requestFilename(uuid);
var repfilename = replyFilename(uuid);
FileSystem.deleteFile(reqfilename);
FileSystem.deleteFile(repfilename);
request.destroy();
reply = new ZMsg();
reply.addString("200");
}
worker.destroy();
}
/**
* Attempt to process a single service request message, return true if successful
* @param client
* @param uuid
* @return
*/publicfunctionserviceSuccess(client:MDCliAPI, uuid:String):Bool {
// Load request message, service will be first framevar filename = requestFilename(uuid);
var file = File.read(filename, false);
var request:ZMsg = null;
try {
request = ZMsg.load(file);
file.close();
} catch (e:Dynamic) {
log("E: Error loading file:" + filename + ", details:" + e);
returnfalse;
}
var service = request.pop();
var serviceName = service.toString();
// Use MMI protocol to check if service is availablevar mmiRequest = new ZMsg();
mmiRequest.add(service);
var mmiReply = client.send("mmi.service", mmiRequest);
var serviceOK = (mmiReply != null && mmiReply.first().streq("200"));
if (serviceOK) {
// Now call requested service and store reply from servicevar reply = client.send(serviceName, request);
if (reply != null) {
filename = replyFilename(uuid);
try {
var file = File.write(filename, false);
ZMsg.save(reply, file);
file.close();
returntrue;
} catch (e:Dynamic) {
log("E: Error writing file:" + filename + ", details:" + e);
returnfalse;
}
}
reply.destroy();
} else
request.destroy();
returnfalse;
}
}
titanic: Titanic broker example in Java
packageguide;
importjava.io.BufferedWriter;
importjava.io.DataInputStream;
importjava.io.DataOutputStream;
importjava.io.File;
importjava.io.FileInputStream;
importjava.io.FileNotFoundException;
importjava.io.FileOutputStream;
importjava.io.FileWriter;
importjava.io.IOException;
importjava.io.RandomAccessFile;
importjava.util.UUID;
importorg.zeromq.ZContext;
importorg.zeromq.ZFrame;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
importorg.zeromq.ZMsg;
importorg.zeromq.ZThread;
importorg.zeromq.ZThread.IAttachedRunnable;
importorg.zeromq.ZThread.IDetachedRunnable;
publicclasstitanic
{
// Return a new UUID as a printable character string
// Caller must free returned string when finished with it
static String generateUUID()
{
return UUID.randomUUID().toString();
}
privatestaticfinal String TITANIC_DIR = ".titanic";
// Returns freshly allocated request filename for given UUID
privatestatic String requestFilename(String uuid)
{
String filename = String.format("%s/%s.req", TITANIC_DIR, uuid);
return filename;
}
// Returns freshly allocated reply filename for given UUID
privatestatic String replyFilename(String uuid)
{
String filename = String.format("%s/%s.rep", TITANIC_DIR, uuid);
return filename;
}
// .split Titanic request service
// The {{titanic.request}} task waits for requests to this service. It
// writes each request to disk and returns a UUID to the client. The client
// picks up the reply asynchronously using the {{titanic.reply}} service:
staticclassTitanicRequestimplements IAttachedRunnable
{
@Overridepublicvoidrun(Object[] args, ZContext ctx, Socket pipe)
{
mdwrkapi worker = new mdwrkapi(
"tcp://localhost:5555", "titanic.request", false
);
ZMsg reply = null;
while (true) {
// Send reply if it's not null
// And then get next request from broker
ZMsg request = worker.receive(reply);
if (request == null)
break; // Interrupted, exit
// Ensure message directory exists
new File(TITANIC_DIR).mkdirs();
// Generate UUID and save message to disk
String uuid = generateUUID();
String filename = requestFilename(uuid);
DataOutputStream file = null;
try {
file = new DataOutputStream(new FileOutputStream(filename));
ZMsg.save(request, file);
}
catch (IOException e) {
e.printStackTrace();
}
finally {
try {
if (file != null)
file.close();
}
catch (IOException e) {
}
}
request.destroy();
// Send UUID through to message queue
reply = new ZMsg();
reply.add(uuid);
reply.send(pipe);
// Now send UUID back to client
// Done by the mdwrk_recv() at the top of the loop
reply = new ZMsg();
reply.add("200");
reply.add(uuid);
}
worker.destroy();
}
}
// .split Titanic reply service
// The {{titanic.reply}} task checks if there's a reply for the specified
// request (by UUID), and returns a 200 (OK), 300 (Pending), or 400
// (Unknown) accordingly:
staticclassTitanicReplyimplements IDetachedRunnable
{
@Overridepublicvoidrun(Object[] args)
{
mdwrkapi worker = new mdwrkapi(
"tcp://localhost:5555", "titanic.reply", false
);
ZMsg reply = null;
while (true) {
ZMsg request = worker.receive(reply);
if (request == null)
break; // Interrupted, exit
String uuid = request.popString();
String reqFilename = requestFilename(uuid);
String repFilename = replyFilename(uuid);
if (new File(repFilename).exists()) {
DataInputStream file = null;
try {
file = new DataInputStream(
new FileInputStream(repFilename)
);
reply = ZMsg.load(file);
reply.push("200");
}
catch (IOException e) {
e.printStackTrace();
}
finally {
try {
if (file != null)
file.close();
}
catch (IOException e) {
}
}
}
else {
reply = new ZMsg();
if (new File(reqFilename).exists())
reply.push("300"); //Pending
else reply.push("400"); //Unknown
}
request.destroy();
}
worker.destroy();
}
}
// .split Titanic close task
// The {{titanic.close}} task removes any waiting replies for the request
// (specified by UUID). It's idempotent, so it is safe to call more than
// once in a row:
staticclassTitanicCloseimplements IDetachedRunnable
{
@Overridepublicvoidrun(Object[] args)
{
mdwrkapi worker = new mdwrkapi(
"tcp://localhost:5555", "titanic.close", false
);
ZMsg reply = null;
while (true) {
ZMsg request = worker.receive(reply);
if (request == null)
break; // Interrupted, exit
String uuid = request.popString();
String req_filename = requestFilename(uuid);
String rep_filename = replyFilename(uuid);
new File(rep_filename).delete();
new File(req_filename).delete();
request.destroy();
reply = new ZMsg();
reply.add("200");
}
worker.destroy();
}
}
// .split worker task
// This is the main thread for the Titanic worker. It starts three child
// threads; for the request, reply, and close services. It then dispatches
// requests to workers using a simple brute force disk queue. It receives
// request UUIDs from the {{titanic.request}} service, saves these to a
// disk file, and then throws each request at MDP workers until it gets a
// response.
publicstaticvoidmain(String[] args)
{
boolean verbose = (args.length > 0 && "-v".equals(args[0]));
try (ZContext ctx = new ZContext()) {
Socket requestPipe = ZThread.fork(ctx, new TitanicRequest());
ZThread.start(new TitanicReply());
ZThread.start(new TitanicClose());
Poller poller = ctx.createPoller(1);
poller.register(requestPipe, ZMQ.Poller.POLLIN);
// Main dispatcher loop
while (true) {
// We'll dispatch once per second, if there's no activity
int rc = poller.poll(1000);
if (rc == -1)
break; // Interrupted
if (poller.pollin(0)) {
// Ensure message directory exists
new File(TITANIC_DIR).mkdirs();
// Append UUID to queue, prefixed with '-' for pending
ZMsg msg = ZMsg.recvMsg(requestPipe);
if (msg == null)
break; // Interrupted
String uuid = msg.popString();
BufferedWriter wfile = null;
try {
wfile = new BufferedWriter(
new FileWriter(TITANIC_DIR + "/queue", true)
);
wfile.write("-" + uuid + "\n");
}
catch (IOException e) {
e.printStackTrace();
break;
}
finally {
try {
if (wfile != null)
wfile.close();
}
catch (IOException e) {
}
}
msg.destroy();
}
// Brute force dispatcher
// "?........:....:....:....:............:";
byte[] entry = newbyte[37];
RandomAccessFile file = null;
try {
file = new RandomAccessFile(TITANIC_DIR + "/queue", "rw");
while (file.read(entry) > 0) {
// UUID is prefixed with '-' if still waiting
if (entry[0] == '-') {
if (verbose)
System.out.printf(
"I: processing request %s\n",
new String(
entry, 1, entry.length - 1, ZMQ.CHARSET
)
);
if (serviceSuccess(
new String(
entry, 1, entry.length - 1, ZMQ.CHARSET
)
)) {
// Mark queue entry as processed
file.seek(file.getFilePointer() - 37);
file.writeBytes("+");
file.seek(file.getFilePointer() + 36);
}
}
// Skip end of line, LF or CRLF
if (file.readByte() == '\r')
file.readByte();
if (Thread.currentThread().isInterrupted())
break;
}
}
catch (FileNotFoundException e) {
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (file != null) {
try {
file.close();
}
catch (IOException e) {
}
}
}
}
}
}
// .split try to call a service
// Here, we first check if the requested MDP service is defined or not,
// using a MMI lookup to the Majordomo broker. If the service exists, we
// send a request and wait for a reply using the conventional MDP client
// API. This is not meant to be fast, just very simple:
staticbooleanserviceSuccess(String uuid)
{
// Load request message, service will be first frame
String filename = requestFilename(uuid);
// If the client already closed request, treat as successful
if (!new File(filename).exists())
returntrue;
DataInputStream file = null;
ZMsg request;
try {
file = new DataInputStream(new FileInputStream(filename));
request = ZMsg.load(file);
}
catch (IOException e) {
e.printStackTrace();
returntrue;
}
finally {
try {
if (file != null)
file.close();
}
catch (IOException e) {
}
}
ZFrame service = request.pop();
String serviceName = service.toString();
// Create MDP client session with short timeout
mdcliapi client = new mdcliapi("tcp://localhost:5555", false);
client.setTimeout(1000); // 1 sec
client.setRetries(1); // only 1 retry
// Use MMI protocol to check if service is available
ZMsg mmiRequest = new ZMsg();
mmiRequest.add(service);
ZMsg mmiReply = client.send("mmi.service", mmiRequest);
boolean serviceOK = (mmiReply != null &&
mmiReply.getFirst().toString().equals("200"));
mmiReply.destroy();
boolean result = false;
if (serviceOK) {
ZMsg reply = client.send(serviceName, request);
if (reply != null) {
filename = replyFilename(uuid);
DataOutputStream ofile = null;
try {
ofile = new DataOutputStream(new FileOutputStream(filename));
ZMsg.save(reply, ofile);
}
catch (IOException e) {
e.printStackTrace();
returntrue;
}
finally {
try {
if (file != null)
file.close();
}
catch (IOException e) {
}
}
result = true;
}
reply.destroy();
}
else request.destroy();
client.destroy();
return result;
}
}
## Titanic service
## Implements server side of http://rfc.zeromq.org/spec:9
lappend auto_path .
package require MDClient 1.0package require MDWorker 1.0package require uuid
if{[llength$argv] == 0}{set argv [list driver]}elseif{[llength$argv] != 1}{puts"Usage: titanic.tcl <driver|request|reply|close>"exit1}set tclsh [info nameofexecutable]expr{srand([pid])}set verbose 0lassign$argv what
set TITANIC_DIR ".titanic"# Return a new UUID as a printable character string
proc s_generate_uuid {}{return[uuid::uuid generate]}# Returns freshly allocated request filename for given UUID
proc s_request_filename {uuid}{return[file join $::TITANIC_DIR$uuid.req]}# Returns freshly allocated reply filename for given UUID
proc s_reply_filename {uuid}{return[file join $::TITANIC_DIR$uuid.rep]}# Titanic request service
proc titanic_request {}{zmq context context
set pipe [zmq socket pipe context PAIR]pipe connect "ipc://titanicpipe.ipc"set worker [MDWorker new "tcp://localhost:5555""titanic.request"$::verbose]set reply {}while{1}{# Send reply if it's not null
# And then get next request from broker
set request [$workerrecv$reply]if{[llength$request] == 0}{break;# Interrupted, exit
}# Ensure message directory exists
file mkdir $::TITANIC_DIR# Generate UUID and save message to disk
set uuid [s_generate_uuid]set filename [s_request_filename$uuid]set file [open$filename"w"]puts -nonewline $file[join$request\n]close$file# Send UUID through to message queue
set reply [list]set reply [zmsg add $reply$uuid]zmsg send $pipe$reply# Now send UUID back to client
# Done by the mdwrk_recv() at the top of the loop
set reply [list]puts"I: titanic.request to $uuid / $reply"set reply [zmsg add $reply"200"]puts"I: titanic.request to $uuid / $reply"set reply [zmsg add $reply$uuid]puts"I: titanic.request to $uuid / $reply"puts[join[zmsg dump $reply]\n]}$workerdestroy}# Titanic reply service
proc titanic_reply {}{set worker [MDWorker new "tcp://localhost:5555""titanic.reply"$::verbose]set reply {}while{1}{set request [$workerrecv$reply]if{[llength$request] == 0}{break}set uuid [zmsg pop request]set req_filename [s_request_filename$uuid]set rep_filename [s_reply_filename$uuid]if{[file exists $rep_filename]}{set file [open$rep_filename r]set reply [split[read$file]\n]set reply [zmsg push $reply"200"]puts"I: titanic.reply to $uuid"puts[join[zmsg dump $reply]\n]close$file}else{if{[file exists $req_filename]}{set reply "300"}else{set reply "400"}}}$workerdestroyreturn0}# Titanic close service
proc titanic_close {}{set worker [MDWorker new "tcp://localhost:5555""titanic.close"$::verbose]set reply ""while{1}{set request [$workerrecv$reply]if{[llength$request] == 0}{break}set uuid [zmsg pop request]set req_filename [s_request_filename$uuid]set rep_filename [s_reply_filename$uuid]file delete -force $req_filenamefile delete -force $rep_filenameset reply "200"}$workerdestroyreturn0}# Attempt to process a single request, return 1 if successful
proc s_service_success {uuid}{# Load request message, service will be first frame
set filename [s_request_filename$uuid]# If the client already closed request, treat as successful
if{![file exists $filename]}{return1}set file [open$filename"r"]set request [split[read$file]\n]set service [zmsg pop request]# Create MDP client session with short timeout
set client [MDClient new "tcp://localhost:5555"$::verbose]$clientset_timeout1000$clientset_retries1# Use MMI protocol to check if service is available
set mmi_request {}set mmi_request [zmsg add $mmi_request$service]set mmi_reply [$clientsend"mmi.service"$mmi_request]if{[lindex$mmi_reply0]eq"200"}{set reply [$clientsend$service$request]if{[llength$reply]}{set filename [s_reply_filename$uuid]set file [open$filename"w"]puts -nonewline $file[join$reply\n]close$filereturn1}}$clientdestroyreturn0}switch -exact -- $what{request{titanic_request}reply{titanic_reply}close{titanic_close}driver{exec$tclsh titanic.tcl request > request.log 2>@1 &
exec$tclsh titanic.tcl reply > reply.log 2>@1 &
exec$tclsh titanic.tcl close > close.log 2>@1 &
after1000;# Wait for other parts to start
zmq context context
zmq socket request_pipe context PAIR
request_pipe bind "ipc://titanicpipe.ipc"set queuefnm [file join $::TITANIC_DIR queue]# Main dispatcher loop
while{1}{# We'll dispatch once per second, if there's no activity
set poll_set [list[list request_pipe [list POLLIN]]]set rpoll_set [zmq poll $poll_set1000]if{[llength$rpoll_set] && "POLLIN"in[lindex$rpoll_set01]}{# Ensure message directory exists
file mkdir $::TITANIC_DIR# Append UUID to queue, prefixed with '-' for pending
set msg [zmsg recv request_pipe]if{[llength$msg] == 0}{break}set file [open$queuefnm"a"]set uuid [zmsg pop msg]puts$file"-$uuid"close$file}# Brute-force dispatcher
if{[file exists $queuefnm]}{set file [open$queuefnm"r"]set queue_list [split[read$file]\n]close$filefor{set i 0}{$i < [llength$queue_list]}{incr i}{set entry [lindex$queue_list$i]if{[string match "-*"$entry]}{set entry [string range $entry1 end]puts"I: processing request $entry"if{[s_service_success$entry]}{lset queue_list $i"+$entry"}}}set file [open$queuefnm"w"]puts -nonewline $file[join$queue_list\n]close$file}}return0}}
To test this, start mdbroker and titanic, and then run ticlient. Now start mdworker arbitrarily, and you should see the client getting a response and exiting happily.
Some notes about this code:
Note that some loops start by sending, others by receiving messages. This is because Titanic acts both as a client and a worker in different roles.
The Titanic broker uses the MMI service discovery protocol to send requests only to services that appear to be running. Since the MMI implementation in our little Majordomo broker is quite poor, this won’t work all the time.
We use an inproc connection to send new request data from the titanic.request service through to the main dispatcher. This saves the dispatcher from having to scan the disk directory, load all request files, and sort them by date/time.
The important thing about this example is not performance (which, although I haven’t tested it, is surely terrible), but how well it implements the reliability contract. To try it, start the mdbroker and titanic programs. Then start the ticlient, and then start the mdworker echo service. You can run all four of these using the -v option to do verbose activity tracing. You can stop and restart any piece except the client and nothing will get lost.
If you want to use Titanic in real cases, you’ll rapidly be asking “how do we make this faster?”
Here’s what I’d do, starting with the example implementation:
Use a single disk file for all data, rather than multiple files. Operating systems are usually better at handling a few large files than many smaller ones.
Organize that disk file as a circular buffer so that new requests can be written contiguously (with very occasional wraparound). One thread, writing full speed to a disk file, can work rapidly.
Keep the index in memory and rebuild the index at startup time, from the disk buffer. This saves the extra disk head flutter needed to keep the index fully safe on disk. You would want an fsync after every message, or every N milliseconds if you were prepared to lose the last M messages in case of a system failure.
Use a solid-state drive rather than spinning iron oxide platters.
Pre-allocate the entire file, or allocate it in large chunks, which allows the circular buffer to grow and shrink as needed. This avoids fragmentation and ensures that most reads and writes are contiguous.
And so on. What I’d not recommend is storing messages in a database, not even a “fast” key/value store, unless you really like a specific database and don’t have performance worries. You will pay a steep price for the abstraction, ten to a thousand times over a raw disk file.
If you want to make Titanic even more reliable, duplicate the requests to a second server, which you’d place in a second location just far away enough to survive a nuclear attack on your primary location, yet not so far that you get too much latency.
If you want to make Titanic much faster and less reliable, store requests and replies purely in memory. This will give you the functionality of a disconnected network, but requests won’t survive a crash of the Titanic server itself.
Figure 52 - High-Availability Pair, Normal Operation
The Binary Star pattern puts two servers in a primary-backup high-availability pair. At any given time, one of these (the active) accepts connections from client applications. The other (the passive) does nothing, but the two servers monitor each other. If the active disappears from the network, after a certain time the passive takes over as active.
We developed the Binary Star pattern at iMatix for our
OpenAMQ server. We designed it:
To provide a straightforward high-availability solution.
To be simple enough to actually understand and use.
To fail over reliably when needed, and only when needed.
Assuming we have a Binary Star pair running, here are the different scenarios that will result in a failover:
The hardware running the primary server has a fatal problem (power supply explodes, machine catches fire, or someone simply unplugs it by mistake), and disappears. Applications see this, and reconnect to the backup server.
The network segment on which the primary server sits crashes–perhaps a router gets hit by a power spike–and applications start to reconnect to the backup server.
The primary server crashes or is killed by the operator and does not restart automatically.
Figure 53 - High-availability Pair During Failover
Recovery from failover works as follows:
The operators restart the primary server and fix whatever problems were causing it to disappear from the network.
The operators stop the backup server at a moment when it will cause minimal disruption to applications.
When applications have reconnected to the primary server, the operators restart the backup server.
Recovery (to using the primary server as active) is a manual operation. Painful experience teaches us that automatic recovery is undesirable. There are several reasons:
Failover creates an interruption of service to applications, possibly lasting 10-30 seconds. If there is a real emergency, this is much better than total outage. But if recovery creates a further 10-30 second outage, it is better that this happens off-peak, when users have gone off the network.
When there is an emergency, the absolute first priority is certainty for those trying to fix things. Automatic recovery creates uncertainty for system administrators, who can no longer be sure which server is in charge without double-checking.
Automatic recovery can create situations where networks fail over and then recover, placing operators in the difficult position of analyzing what happened. There was an interruption of service, but the cause isn’t clear.
Having said this, the Binary Star pattern will fail back to the primary server if this is running (again) and the backup server fails. In fact, this is how we provoke recovery.
The shutdown process for a Binary Star pair is to either:
Stop the passive server and then stop the active server at any later time, or
Stop both servers in any order but within a few seconds of each other.
Stopping the active and then the passive server with any delay longer than the failover timeout will cause applications to disconnect, then reconnect, and then disconnect again, which may disturb users.
Binary Star is as simple as it can be, while still working accurately. In fact, the current design is the third complete redesign. Each of the previous designs we found to be too complex, trying to do too much, and we stripped out functionality until we came to a design that was understandable, easy to use, and reliable enough to be worth using.
These are our requirements for a high-availability architecture:
The failover is meant to provide insurance against catastrophic system failures, such as hardware breakdown, fire, accident, and so on. There are simpler ways to recover from ordinary server crashes and we already covered these.
Failover time should be under 60 seconds and preferably under 10 seconds.
Failover has to happen automatically, whereas recovery must happen manually. We want applications to switch over to the backup server automatically, but we do not want them to switch back to the primary server except when the operators have fixed whatever problem there was and decided that it is a good time to interrupt applications again.
The semantics for client applications should be simple and easy for developers to understand. Ideally, they should be hidden in the client API.
There should be clear instructions for network architects on how to avoid designs that could lead to split brain syndrome, in which both servers in a Binary Star pair think they are the active server.
There should be no dependencies on the order in which the two servers are started.
It must be possible to make planned stops and restarts of either server without stopping client applications (though they may be forced to reconnect).
Operators must be able to monitor both servers at all times.
It must be possible to connect the two servers using a high-speed dedicated network connection. That is, failover synchronization must be able to use a specific IP route.
We make the following assumptions:
A single backup server provides enough insurance; we don’t need multiple levels of backup.
The primary and backup servers are equally capable of carrying the application load. We do not attempt to balance load across the servers.
There is sufficient budget to cover a fully redundant backup server that does nothing almost all the time.
We don’t attempt to cover the following:
The use of an active backup server or load balancing. In a Binary Star pair, the backup server is inactive and does no useful work until the primary server goes offline.
The handling of persistent messages or transactions in any way. We assume the existence of a network of unreliable (and probably untrusted) servers or Binary Star pairs.
Any automatic exploration of the network. The Binary Star pair is manually and explicitly defined in the network and is known to applications (at least in their configuration data).
Replication of state or messages between servers. All server-side state must be recreated by applications when they fail over.
Here is the key terminology that we use in Binary Star:
Primary: the server that is normally or initially active.
Backup: the server that is normally passive. It will become active if and when the primary server disappears from the network, and when client applications ask the backup server to connect.
Active: the server that accepts client connections. There is at most one active server.
Passive: the server that takes over if the active disappears. Note that when a Binary Star pair is running normally, the primary server is active, and the backup is passive. When a failover has happened, the roles are switched.
To configure a Binary Star pair, you need to:
Tell the primary server where the backup server is located.
Tell the backup server where the primary server is located.
Optionally, tune the failover response times, which must be the same for both servers.
The main tuning concern is how frequently you want the servers to check their peering status, and how quickly you want to activate failover. In our example, the failover timeout value defaults to 2,000 msec. If you reduce this, the backup server will take over as active more rapidly but may take over in cases where the primary server could recover. For example, you may have wrapped the primary server in a shell script that restarts it if it crashes. In that case, the timeout should be higher than the time needed to restart the primary server.
For client applications to work properly with a Binary Star pair, they must:
Know both server addresses.
Try to connect to the primary server, and if that fails, to the backup server.
Detect a failed connection, typically using heartbeating.
Try to reconnect to the primary, and then backup (in that order), with a delay between retries that is at least as high as the server failover timeout.
Recreate all of the state they require on a server.
Retransmit messages lost during a failover, if messages need to be reliable.
It’s not trivial work, and we’d usually wrap this in an API that hides it from real end-user applications.
These are the main limitations of the Binary Star pattern:
A server process cannot be part of more than one Binary Star pair.
A primary server can have a single backup server, and no more.
The passive server does no useful work, and is thus wasted.
The backup server must be capable of handling full application loads.
Failover configuration cannot be modified at runtime.
Client applications must do some work to benefit from failover.
Split-brain syndrome occurs when different parts of a cluster think they are active at the same time. It causes applications to stop seeing each other. Binary Star has an algorithm for detecting and eliminating split brain, which is based on a three-way decision mechanism (a server will not decide to become active until it gets application connection requests and it cannot see its peer server).
However, it is still possible to (mis)design a network to fool this algorithm. A typical scenario would be a Binary Star pair, that is distributed between two buildings, where each building also had a set of applications and where there was a single network link between both buildings. Breaking this link would create two sets of client applications, each with half of the Binary Star pair, and each failover server would become active.
To prevent split-brain situations, we must connect a Binary Star pair using a dedicated network link, which can be as simple as plugging them both into the same switch or, better, using a crossover cable directly between two machines.
We must not split a Binary Star architecture into two islands, each with a set of applications. While this may be a common type of network architecture, you should use federation, not high-availability failover, in such cases.
A suitably paranoid network configuration would use two private cluster interconnects, rather than a single one. Further, the network cards used for the cluster would be different from those used for message traffic, and possibly even on different paths on the server hardware. The goal is to separate possible failures in the network from possible failures in the cluster. Network ports can have a relatively high failure rate.
Without further ado, here is a proof-of-concept implementation of the Binary Star server. The primary and backup servers run the same code, you choose their roles when you run the code:
// Binary Star server proof-of-concept implementation. This server does no
// real work; it just demonstrates the Binary Star failover model.
#include"czmq.h"// States we can be in at any point in time
typedefenum {
STATE_PRIMARY = 1, // Primary, waiting for peer to connect
STATE_BACKUP = 2, // Backup, waiting for peer to connect
STATE_ACTIVE = 3, // Active - accepting connections
STATE_PASSIVE = 4// Passive - not accepting connections
} state_t;
// Events, which start with the states our peer can be in
typedefenum {
PEER_PRIMARY = 1, // HA peer is pending primary
PEER_BACKUP = 2, // HA peer is pending backup
PEER_ACTIVE = 3, // HA peer is active
PEER_PASSIVE = 4, // HA peer is passive
CLIENT_REQUEST = 5// Client makes request
} event_t;
// Our finite state machine
typedefstruct {
state_t state; // Current state
event_t event; // Current event
int64_t peer_expiry; // When peer is considered 'dead'
} bstar_t;
// We send state information this often
// If peer doesn't respond in two heartbeats, it is 'dead'
#define HEARTBEAT 1000 // In msecs
// .split Binary Star state machine
// The heart of the Binary Star design is its finite-state machine (FSM).
// The FSM runs one event at a time. We apply an event to the current state,
// which checks if the event is accepted, and if so, sets a new state:
staticbools_state_machine (bstar_t *fsm)
{
bool exception = false;
// These are the PRIMARY and BACKUP states; we're waiting to become
// ACTIVE or PASSIVE depending on events we get from our peer:
if (fsm->state == STATE_PRIMARY) {
if (fsm->event == PEER_BACKUP) {
printf ("I: connected to backup (passive), ready active\n");
fsm->state = STATE_ACTIVE;
}
elseif (fsm->event == PEER_ACTIVE) {
printf ("I: connected to backup (active), ready passive\n");
fsm->state = STATE_PASSIVE;
}
// Accept client connections
}
elseif (fsm->state == STATE_BACKUP) {
if (fsm->event == PEER_ACTIVE) {
printf ("I: connected to primary (active), ready passive\n");
fsm->state = STATE_PASSIVE;
}
else// Reject client connections when acting as backup
if (fsm->event == CLIENT_REQUEST)
exception = true;
}
else// .split active and passive states
// These are the ACTIVE and PASSIVE states:
if (fsm->state == STATE_ACTIVE) {
if (fsm->event == PEER_ACTIVE) {
// Two actives would mean split-brain
printf ("E: fatal error - dual actives, aborting\n");
exception = true;
}
}
else// Server is passive
// CLIENT_REQUEST events can trigger failover if peer looks dead
if (fsm->state == STATE_PASSIVE) {
if (fsm->event == PEER_PRIMARY) {
// Peer is restarting - become active, peer will go passive
printf ("I: primary (passive) is restarting, ready active\n");
fsm->state = STATE_ACTIVE;
}
elseif (fsm->event == PEER_BACKUP) {
// Peer is restarting - become active, peer will go passive
printf ("I: backup (passive) is restarting, ready active\n");
fsm->state = STATE_ACTIVE;
}
elseif (fsm->event == PEER_PASSIVE) {
// Two passives would mean cluster would be non-responsive
printf ("E: fatal error - dual passives, aborting\n");
exception = true;
}
elseif (fsm->event == CLIENT_REQUEST) {
// Peer becomes active if timeout has passed
// It's the client request that triggers the failover
assert (fsm->peer_expiry > 0);
if (zclock_time () >= fsm->peer_expiry) {
// If peer is dead, switch to the active state
printf ("I: failover successful, ready active\n");
fsm->state = STATE_ACTIVE;
}
else// If peer is alive, reject connections
exception = true;
}
}
return exception;
}
// .split main task
// This is our main task. First we bind/connect our sockets with our
// peer and make sure we will get state messages correctly. We use
// three sockets; one to publish state, one to subscribe to state, and
// one for client requests/replies:
intmain (int argc, char *argv [])
{
// Arguments can be either of:
// -p primary server, at tcp://localhost:5001
// -b backup server, at tcp://localhost:5002
zctx_t *ctx = zctx_new ();
void *statepub = zsocket_new (ctx, ZMQ_PUB);
void *statesub = zsocket_new (ctx, ZMQ_SUB);
zsocket_set_subscribe (statesub, "");
void *frontend = zsocket_new (ctx, ZMQ_ROUTER);
bstar_t fsm = { 0 };
if (argc == 2 && streq (argv [1], "-p")) {
printf ("I: Primary active, waiting for backup (passive)\n");
zsocket_bind (frontend, "tcp://*:5001");
zsocket_bind (statepub, "tcp://*:5003");
zsocket_connect (statesub, "tcp://localhost:5004");
fsm.state = STATE_PRIMARY;
}
elseif (argc == 2 && streq (argv [1], "-b")) {
printf ("I: Backup passive, waiting for primary (active)\n");
zsocket_bind (frontend, "tcp://*:5002");
zsocket_bind (statepub, "tcp://*:5004");
zsocket_connect (statesub, "tcp://localhost:5003");
fsm.state = STATE_BACKUP;
}
else {
printf ("Usage: bstarsrv { -p | -b }\n");
zctx_destroy (&ctx);
exit (0);
}
// .split handling socket input
// We now process events on our two input sockets, and process these
// events one at a time via our finite-state machine. Our "work" for
// a client request is simply to echo it back:
// Set timer for next outgoing state message
int64_t send_state_at = zclock_time () + HEARTBEAT;
while (!zctx_interrupted) {
zmq_pollitem_t items [] = {
{ frontend, 0, ZMQ_POLLIN, 0 },
{ statesub, 0, ZMQ_POLLIN, 0 }
};
int time_left = (int) ((send_state_at - zclock_time ()));
if (time_left < 0)
time_left = 0;
int rc = zmq_poll (items, 2, time_left * ZMQ_POLL_MSEC);
if (rc == -1)
break; // Context has been shut down
if (items [0].revents & ZMQ_POLLIN) {
// Have a client request
zmsg_t *msg = zmsg_recv (frontend);
fsm.event = CLIENT_REQUEST;
if (s_state_machine (&fsm) == false)
// Answer client by echoing request back
zmsg_send (&msg, frontend);
else
zmsg_destroy (&msg);
}
if (items [1].revents & ZMQ_POLLIN) {
// Have state from our peer, execute as event
char *message = zstr_recv (statesub);
fsm.event = atoi (message);
free (message);
if (s_state_machine (&fsm))
break; // Error, so exit
fsm.peer_expiry = zclock_time () + 2 * HEARTBEAT;
}
// If we timed out, send state to peer
if (zclock_time () >= send_state_at) {
char message [2];
sprintf (message, "%d", fsm.state);
zstr_send (statepub, message);
send_state_at = zclock_time () + HEARTBEAT;
}
}
if (zctx_interrupted)
printf ("W: interrupted\n");
// Shutdown sockets and context
zctx_destroy (&ctx);
return0;
}
bstarsrv: Binary Star server in C++
// Binary Star server proof-of-concept implementation. This server does no
// real work; it just demonstrates the Binary Star failover model.
#include"zmsg.hpp"#define ZMQ_POLL_MSEC 1 // One second
// States we can be in at any point in time
typedefenum {
STATE_NOTSET = 0, // Before we start, or if the state is invalid
STATE_PRIMARY = 1, // Primary, waiting for peer to connect
STATE_BACKUP = 2, // Backup, waiting for peer to connect
STATE_ACTIVE = 3, // Active - accepting connections
STATE_PASSIVE = 4// Passive - not accepting connections
} state_t;
// Events, which start with the states our peer can be in
typedefenum {
EVENT_NOTSET = 0, // Before we start, or if the event is invalid
PEER_PRIMARY = 1, // HA peer is pending primary
PEER_BACKUP = 2, // HA peer is pending backup
PEER_ACTIVE = 3, // HA peer is active
PEER_PASSIVE = 4, // HA peer is passive
CLIENT_REQUEST = 5// Client makes request
} event_t;
// We send state information this often
// If peer doesn't respond in two heartbeats, it is 'dead'
#define HEARTBEAT 1000 // In msecs
// .split Binary Star state machine
// The heart of the Binary Star design is its finite-state machine (FSM).
// The FSM runs one event at a time. We apply an event to the current state,
// which checks if the event is accepted, and if so, sets a new state:
// Our finite state machine
classbstar {
public:
bstar() : m_state(STATE_NOTSET), m_event(EVENT_NOTSET), m_peer_expiry(0) {}
boolstate_machine(event_t event) {
m_event = event;
bool exception = false;
// These are the PRIMARY and BACKUP states; we're waiting to become
// ACTIVE or PASSIVE depending on events we get from our peer:
if (m_state == STATE_PRIMARY) {
if (m_event == PEER_BACKUP) {
std::cout << "I: connected to backup (passive), ready active" << std::endl;
m_state = STATE_ACTIVE;
} elseif (m_event == PEER_ACTIVE) {
std::cout << "I: connected to backup (active), ready passive" << std::endl;
m_state = STATE_PASSIVE;
}
// Accept client connections
} elseif (m_state == STATE_BACKUP) {
if (m_event == PEER_ACTIVE) {
std::cout << "I: connected to primary (active), ready passive" << std::endl;
m_state = STATE_PASSIVE;
} elseif (m_event == CLIENT_REQUEST) {
// Reject client connections when acting as backup
exception = true;
}
// .split active and passive states
// These are the ACTIVE and PASSIVE states:
} elseif (m_state == STATE_ACTIVE) {
if (m_event == PEER_ACTIVE) {
std::cout << "E: fatal error - dual actives, aborting" << std::endl;
exception = true;
}
// Server is passive
// CLIENT_REQUEST events can trigger failover if peer looks dead
} elseif (m_state == STATE_PASSIVE) {
if (m_event == PEER_PRIMARY) {
// Peer is restarting - become active, peer will go passive
std::cout << "I: primary (passive) is restarting, ready active" << std::endl;
m_state = STATE_ACTIVE;
} elseif (m_event == PEER_BACKUP) {
// Peer is restarting - become active, peer will go passive
std::cout << "I: backup (passive) is restarting, ready active" << std::endl;
m_state = STATE_ACTIVE;
} elseif (m_event == PEER_PASSIVE) {
// Two passives would mean cluster would be non-responsive
std::cout << "E: fatal error - dual passives, aborting" << std::endl;
exception = true;
} elseif (m_event == CLIENT_REQUEST) {
// Peer becomes active if timeout has passed
// It's the client request that triggers the failover
assert(m_peer_expiry > 0);
if (s_clock() >= m_peer_expiry) {
std::cout << "I: failover successful, ready active" << std::endl;
m_state = STATE_ACTIVE;
} else {
// If peer is alive, reject connections
exception = true;
}
}
}
return exception;
}
voidset_state(state_t state) {
m_state = state;
}
state_t get_state() {
return m_state;
}
voidset_peer_expiry(int64_t expiry) {
m_peer_expiry = expiry;
}
private:
state_t m_state; // Current state
event_t m_event; // Current event
int64_t m_peer_expiry; // When peer is considered 'dead', milliseconds
};
intmain(int argc, char *argv []) {
// Arguments can be either of:
// -p primary server, at tcp://localhost:5001
// -b backup server, at tcp://localhost:5002
zmq::context_t context(1);
zmq::socket_t statepub(context, ZMQ_PUB);
zmq::socket_t statesub(context, ZMQ_SUB);
statesub.set(zmq::sockopt::subscribe, "");
zmq::socket_t frontend(context, ZMQ_ROUTER);
bstar fsm;
if (argc == 2 && strcmp(argv[1], "-p") == 0) {
std::cout << "I: Primary active, waiting for backup (passive)" << std::endl;
frontend.bind("tcp://*:5001");
statepub.bind("tcp://*:5003");
statesub.connect("tcp://localhost:5004");
fsm.set_state(STATE_PRIMARY);
} elseif (argc == 2 && strcmp(argv[1], "-b") == 0) {
std::cout << "I: Backup passive, waiting for primary (active)" << std::endl;
frontend.bind("tcp://*:5002");
statepub.bind("tcp://*:5004");
statesub.connect("tcp://localhost:5003");
fsm.set_state(STATE_BACKUP);
} else {
std::cout << "Usage: bstarsrv { -p | -b }" << std::endl;
return0;
}
// .split handling socket input
// We now process events on our two input sockets, and process these
// events one at a time via our finite-state machine. Our "work" for
// a client request is simply to echo it back:
// Set timer for next outgoing state message
int64_t send_state_at = s_clock() + HEARTBEAT;
s_catch_signals(); // catch SIGINT and SIGTERM
while(!s_interrupted) {
zmq::pollitem_t items [] = {
{ frontend, 0, ZMQ_POLLIN, 0 },
{ statesub, 0, ZMQ_POLLIN, 0 }
};
int time_left = (int) (send_state_at - s_clock());
if (time_left < 0)
time_left = 0;
try {
zmq::poll(items, 2, time_left * ZMQ_POLL_MSEC);
} catch (zmq::error_t &e) {
break; // Interrupted
}
if (items[0].revents & ZMQ_POLLIN) {
// Have client request, process it
zmsg msg;
msg.recv(frontend);
if (msg.parts() == 0)
break; // Ctrl-C
if (fsm.state_machine(CLIENT_REQUEST) == false) {
// Answer client by echoing request back
msg.send(frontend);
}
}
if (items[1].revents & ZMQ_POLLIN) {
// Have state from our peer, execute as event
std::string message = s_recv(statesub);
std::cout << "I: received state msg:" << message << std::endl;
event_t event = (event_t)std::stoi(message); // peer's state is our event
if (fsm.state_machine(event) == true) {
break; // Error, exit
}
fsm.set_peer_expiry(s_clock() + 2 * HEARTBEAT);
}
// If we timed out, send state to peer
if (s_clock() >= send_state_at) {
std::string state = std::to_string(fsm.get_state());
std::cout << "sending state:" << state << std::endl;
s_send(statepub, state);
// std::cout << "error: " << zmq_strerror(zmq_errno()) << std::endl;
send_state_at = s_clock() + HEARTBEAT;
}
}
if (s_interrupted) {
std::cout << "W: interrupt received, shutting down..." << std::endl;
}
return0;
}
package ;
importhaxe.io.Bytes;
importhaxe.Stack;
importneko.Sys;
importneko.Lib;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQPoller;
importorg.zeromq.ZMQException;
importorg.zeromq.ZMsg;
importorg.zeromq.ZSocket;
/**
* Binary Star Server
* @author Richard J Smith
*
* @see http://zguide.zeromq.org/page:all#Binary-Star-Implementation
*/class BStarSrv
{
privatestaticinlinevar HEARTBEAT = 100;
/** Current state */publicvar state:StateT;
/** Current event */publicvar event:EventT;
/** When peer is considered 'dead' */publicvar peerExpiry:Float;
/**
* BStarSrv constructor
* @param state Initial state
*/publicfunctionnew(state:StateT) {
this.state = state;
}
/**
* Main binary star server loop
*/publicfunctionrun() {
var ctx = new ZContext();
var statePub = ctx.createSocket(ZMQ_PUB);
var stateSub = ctx.createSocket(ZMQ_SUB);
var frontend = ctx.createSocket(ZMQ_ROUTER);
switch (state) {
case STATE_PRIMARY:
Lib.println("I: primary master, waiting for backup (slave)");
frontend.bind("tcp://*:5001");
statePub.bind("tcp://*:5003");
stateSub.setsockopt(ZMQ_SUBSCRIBE, Bytes.ofString(""));
stateSub.connect("tcp://localhost:5004");
case STATE_BACKUP:
Lib.println("I: backup slave, waiting for primary (master)");
frontend.bind("tcp://*:5002");
statePub.bind("tcp://*:5004");
stateSub.setsockopt(ZMQ_SUBSCRIBE, Bytes.ofString(""));
stateSub.connect("tcp://localhost:5003");
default:
ctx.destroy();
return;
}
// Set timer for next outgoing state messagevar sendStateAt = Date.now().getTime() + HEARTBEAT;
var poller = new ZMQPoller();
poller.registerSocket(frontend, ZMQ.ZMQ_POLLIN());
poller.registerSocket(stateSub, ZMQ.ZMQ_POLLIN());
while (!ZMQ.isInterrupted()) {
var timeLeft = Std.int(sendStateAt - Date.now().getTime());
if (timeLeft < 0)
timeLeft = 0;
try {
var res = poller.poll(timeLeft * 1000); // Convert timeout to microseconds
} catch (e:ZMQException) {
if (!ZMQ.isInterrupted()) {
trace("ZMQException #:" + e.errNo + ", str:" + e.str());
trace (Stack.toString(Stack.exceptionStack()));
} else {
Lib.println("W: interrupt received, killing server...");
}
ctx.destroy();
return;
}
if (poller.pollin(1)) {
// Have a client requestvar msg = ZMsg.recvMsg(frontend);
event = CLIENT_REQUEST;
if (!stateMachine())
// Answer client by echoing request back
msg.send(frontend); // Pretend do some work and then replyelse
msg.destroy();
}
if (poller.pollin(2)) {
// Have state from our peer, execute as eventvar message = stateSub.recvMsg().toString();
event = Type.createEnumIndex(EventT, Std.parseInt(message));
if (stateMachine())
break; // Error, so exit
peerExpiry = Date.now().getTime() + (2 * HEARTBEAT);
}
// If we timed-out, send state to peerif (Date.now().getTime() >= sendStateAt) {
statePub.sendMsg(Bytes.ofString(Std.string(Type.enumIndex(state))));
sendStateAt = Date.now().getTime() + HEARTBEAT;
}
}
ctx.destroy();
}
/**
* Executes finite state machine (apply event to this state)
* Returns true if there was an exception
* @return
*/publicfunctionstateMachine():Bool
{
var exception = false;
switch (state) {
case STATE_PRIMARY:
// Primary server is waiting for peer to connect// Accepts CLIENT_REQUEST events in this stateswitch (event) {
case PEER_BACKUP:
Lib.println("I: connected to backup (slave), ready as master");
state = STATE_ACTIVE;
case PEER_ACTIVE:
Lib.println("I: connected to backup (master), ready as slave");
state = STATE_PASSIVE;
default:
}
case STATE_BACKUP:
// Backup server is waiting for peer to connect// Rejects CLIENT_REQUEST events in this stateswitch (event) {
case PEER_ACTIVE:
Lib.println("I: connected to primary (master), ready as slave");
state = STATE_PASSIVE;
case CLIENT_REQUEST:
exception = true;
default:
}
case STATE_ACTIVE:
// Server is active// Accepts CLIENT_REQUEST events in this stateswitch (event) {
case PEER_ACTIVE:
// Two masters would mean split-brain
Lib.println("E: fatal error - dual masters, aborting");
exception = true;
default:
}
case STATE_PASSIVE:
// Server is passive// CLIENT_REQUEST events can trigger failover if peer looks deadswitch (event) {
case PEER_PRIMARY:
// Peer is restarting - become active, peer will go passive
Lib.println("I: primary (slave) is restarting, ready as master");
state = STATE_ACTIVE;
case PEER_BACKUP:
// Peer is restarting - become active, peer will go passive
Lib.println("I: backup (slave) is restarting, ready as master");
state = STATE_ACTIVE;
case PEER_PASSIVE:
// Two passives would mean cluster would be non-responsive
Lib.println("E: fatal error - dual slaves, aborting");
exception = true;
case CLIENT_REQUEST:
// Peer becomes master if timeout as passed// It's the client request that triggers the failoverif (Date.now().getTime() >= peerExpiry) {
// If peer is dead, switch to the active state
Lib.println("I: failover successful, ready as master");
state = STATE_ACTIVE;
} else {
Lib.println("I: peer is active, so ignore connection");
exception = true;
}
default:
}
}
return exception;
}
publicstaticfunctionmain() {
Lib.println("** BStarSrv (see: http://zguide.zeromq.org/page:all#Binary-Star-Implementation)");
var state:StateT = null;
var argArr = Sys.args();
if (argArr.length > 1 && argArr[argArr.length - 1] == "-p") {
state = STATE_PRIMARY;
} elseif (argArr.length > 1 && argArr[argArr.length - 1] == "-b") {
state = STATE_BACKUP;
} else {
Lib.println("Usage: bstartsrv { -p | -b }");
return;
}
var bstarServer = new BStarSrv(state);
bstarServer.run();
}
}
// States we can be in at any timeprivateenum StateT {
STATE_PRIMARY; // Primary, waiting for peer to connect
STATE_BACKUP; // Backup, waiting for peer to connect
STATE_ACTIVE; // Active - accepting connections
STATE_PASSIVE; // Passive - not accepting connections
}
privateenum EventT {
PEER_PRIMARY; // HA peer is pending primary
PEER_BACKUP; // HA peer is pending backup
PEER_ACTIVE; // HA peer is active
PEER_PASSIVE; // HA peer is passive
CLIENT_REQUEST; // Client makes request
}
bstarsrv: Binary Star server in Java
packageguide;
importorg.zeromq.SocketType;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
importorg.zeromq.ZMsg;
// Binary Star server proof-of-concept implementation. This server does no
// real work; it just demonstrates the Binary Star failover model.
publicclassbstarsrv
{
// States we can be in at any point in time
enum State
{
STATE_PRIMARY, // Primary, waiting for peer to connect
STATE_BACKUP, // Backup, waiting for peer to connect
STATE_ACTIVE, // Active - accepting connections
STATE_PASSIVE // Passive - not accepting connections
}
// Events, which start with the states our peer can be in
enum Event
{
PEER_PRIMARY, // HA peer is pending primary
PEER_BACKUP, // HA peer is pending backup
PEER_ACTIVE, // HA peer is active
PEER_PASSIVE, // HA peer is passive
CLIENT_REQUEST // Client makes request
}
// Our finite state machine
private State state; // Current state
private Event event; // Current event
privatelong peerExpiry; // When peer is considered 'dead'
// We send state information this often
// If peer doesn't respond in two heartbeats, it is 'dead'
privatefinalstaticlong HEARTBEAT = 1000; // In msecs
// .split Binary Star state machine
// The heart of the Binary Star design is its finite-state machine (FSM).
// The FSM runs one event at a time. We apply an event to the current state,
// which checks if the event is accepted, and if so, sets a new state:
privatebooleanstateMachine()
{
boolean exception = false;
// These are the PRIMARY and BACKUP states; we're waiting to become
// ACTIVE or PASSIVE depending on events we get from our peer:
if (state == State.STATE_PRIMARY) {
if (event == Event.PEER_BACKUP) {
System.out.printf("I: connected to backup (passive), ready active\n");
state = State.STATE_ACTIVE;
}
elseif (event == Event.PEER_ACTIVE) {
System.out.printf("I: connected to backup (active), ready passive\n");
state = State.STATE_PASSIVE;
}
// Accept client connections
}
elseif (state == State.STATE_BACKUP) {
if (event == Event.PEER_ACTIVE) {
System.out.printf("I: connected to primary (active), ready passive\n");
state = State.STATE_PASSIVE;
}
else// Reject client connections when acting as backup
if (event == Event.CLIENT_REQUEST)
exception = true;
}
else// .split active and passive states
// These are the ACTIVE and PASSIVE states:
if (state == State.STATE_ACTIVE) {
if (event == Event.PEER_ACTIVE) {
// Two actives would mean split-brain
System.out.printf("E: fatal error - dual actives, aborting\n");
exception = true;
}
}
else// Server is passive
// CLIENT_REQUEST events can trigger failover if peer looks dead
if (state == State.STATE_PASSIVE) {
if (event == Event.PEER_PRIMARY) {
// Peer is restarting - become active, peer will go passive
System.out.printf("I: primary (passive) is restarting, ready active\n");
state = State.STATE_ACTIVE;
}
elseif (event == Event.PEER_BACKUP) {
// Peer is restarting - become active, peer will go passive
System.out.printf("I: backup (passive) is restarting, ready active\n");
state = State.STATE_ACTIVE;
}
elseif (event == Event.PEER_PASSIVE) {
// Two passives would mean cluster would be non-responsive
System.out.printf("E: fatal error - dual passives, aborting\n");
exception = true;
}
elseif (event == Event.CLIENT_REQUEST) {
// Peer becomes active if timeout has passed
// It's the client request that triggers the failover
assert (peerExpiry > 0);
if (System.currentTimeMillis() >= peerExpiry) {
// If peer is dead, switch to the active state
System.out.printf("I: failover successful, ready active\n");
state = State.STATE_ACTIVE;
}
else// If peer is alive, reject connections
exception = true;
}
}
return exception;
}
// .split main task
// This is our main task. First we bind/connect our sockets with our
// peer and make sure we will get state messages correctly. We use
// three sockets; one to publish state, one to subscribe to state, and
// one for client requests/replies:
publicstaticvoidmain(String[] argv)
{
// Arguments can be either of:
// -p primary server, at tcp://localhost:5001
// -b backup server, at tcp://localhost:5002
try (ZContext ctx = new ZContext()) {
Socket statepub = ctx.createSocket(SocketType.PUB);
Socket statesub = ctx.createSocket(SocketType.SUB);
statesub.subscribe(ZMQ.SUBSCRIPTION_ALL);
Socket frontend = ctx.createSocket(SocketType.ROUTER);
bstarsrv fsm = new bstarsrv();
if (argv.length == 1 && argv[0].equals("-p")) {
System.out.printf("I: Primary active, waiting for backup (passive)\n");
frontend.bind("tcp://*:5001");
statepub.bind("tcp://*:5003");
statesub.connect("tcp://localhost:5004");
fsm.state = State.STATE_PRIMARY;
}
elseif (argv.length == 1 && argv[0].equals("-b")) {
System.out.printf("I: Backup passive, waiting for primary (active)\n");
frontend.bind("tcp://*:5002");
statepub.bind("tcp://*:5004");
statesub.connect("tcp://localhost:5003");
fsm.state = State.STATE_BACKUP;
}
else {
System.out.printf("Usage: bstarsrv { -p | -b }\n");
ctx.destroy();
System.exit(0);
}
// .split handling socket input
// We now process events on our two input sockets, and process
// these events one at a time via our finite-state machine. Our
// "work" for a client request is simply to echo it back.
Poller poller = ctx.createPoller(2);
poller.register(frontend, ZMQ.Poller.POLLIN);
poller.register(statesub, ZMQ.Poller.POLLIN);
// Set timer for next outgoing state message
long sendStateAt = System.currentTimeMillis() + HEARTBEAT;
while (!Thread.currentThread().isInterrupted()) {
int timeLeft = (int) ((sendStateAt - System.currentTimeMillis()));
if (timeLeft < 0)
timeLeft = 0;
int rc = poller.poll(timeLeft);
if (rc == -1)
break; // Context has been shut down
if (poller.pollin(0)) {
// Have a client request
ZMsg msg = ZMsg.recvMsg(frontend);
fsm.event = Event.CLIENT_REQUEST;
if (fsm.stateMachine() == false)
// Answer client by echoing request back
msg.send(frontend);
else msg.destroy();
}
if (poller.pollin(1)) {
// Have state from our peer, execute as event
String message = statesub.recvStr();
fsm.event = Event.values()[Integer.parseInt(message)];
if (fsm.stateMachine())
break; // Error, so exit
fsm.peerExpiry = System.currentTimeMillis() + 2 * HEARTBEAT;
}
// If we timed out, send state to peer
if (System.currentTimeMillis() >= sendStateAt) {
statepub.send(String.valueOf(fsm.state.ordinal()));
sendStateAt = System.currentTimeMillis() + HEARTBEAT;
}
}
if (Thread.currentThread().isInterrupted())
System.out.printf("W: interrupted\n");
}
}
}
#!/usr/bin/env ruby# vim: ft=ruby# Binary Star server proof-of-concept implementation. This server does no# real work; it just demonstrates the Binary Star failover model.require'optparse'require'cztop'# We send state information this often# If peer doesn't respond in two heartbeats, it is 'dead'HEARTBEAT = 1000# in msecs# Binary Star finite-state machine.classBStarStateException = Class.new(StandardError)
attr_accessor:stateattr_writer:peer_expirydefinitialize(state, peer_expiry = nil)
unless [:primary, :backup].include? state
abort"invalid initial state #{state.inspect}"end
@state = state
@peer_expiry = peer_expiry
enddef<<(event)
puts"processing event #{event.inspect} ..."case @state
when:primarycase event
when:peer_backupputs"I: connected to backup (passive), ready active"
@state = :activewhen:peer_activeputs"I: connected to backup (active), ready passive"
@state = :passiveend# Accept client connectionswhen:backupcase event
when:peer_activeputs"I: connected to primary (active), ready passive"
@state = :passivewhen:client_request# Reject client connections when acting as backupraiseException, "not active"endwhen:activecase event
when:peer_active# Two actives would mean split-brainputs"E: fatal error - dual actives, aborting"abort"split brain"endwhen:passivecase event
when:peer_primary# Peer is restarting - become active, peer will go passiveputs"I: primary (passive) is restarting, ready active"
@state = :activewhen:peer_backup# Peer is restarting - become active, peer will go passiveputs"I: backup (passive) is restarting, ready active"
@state = :active;
when:peer_passive# Two passives would mean cluster would be non-responsiveputs"E: fatal error - dual passives, aborting"abort"dual passives"when:client_request# Peer becomes active if timeout has passed# It's the client request that triggers the failoverabort"bad peer expiry"unless @peer_expiry
ifTime.now >= @peer_expiry
# If peer is dead, switch to the active stateputs"I: failover successful, ready active"
@state = :activeelse# If peer is alive, reject connectionsraiseException, "peer is alive"endendendendendif __FILE__ == $0
options = {}
OptionParser.new do |opts|
opts.banner = "Usage: #$0 [options]"
opts.on("-p", "--primary", "run as primary server") do |v|
options[:role] = :primaryend
opts.on("-b", "--backup", "run as backup server") do |v|
options[:role] = :backupendend.parse!
unless options[:role]
abort"Usage: #$0 { -p | -b }"end# We use three sockets; one to publish state, one to subscribe to state, and# one for client requests/replies.
statepub = CZTop::Socket::PUB.new
statesub = CZTop::Socket::SUB.new
statesub.subscribe
frontend = CZTop::Socket::ROUTER.new
# We bind/connect our sockets with our peer and make sure we will get state# messages correctly.case options[:role]
when:primaryputs"I: Primary master, waiting for backup (slave)"
statepub.bind("tcp://*:5003")
statesub.connect("tcp://localhost:5004")
frontend.bind("tcp://*:5001")
bstar = BStarState.new(:primary)
when:backupputs"I: Backup slave, waiting for primary (master)"
statepub.bind("tcp://*:5004")
statesub.connect("tcp://localhost:5003")
statesub.subscribe
frontend.bind("tcp://*:5002")
bstar = BStarState.new(:backup)
end# We now process events on our two input sockets, and process these events# one at a time via our finite-state machine. Our "work" for a client# request is simply to echo it back:
poller = CZTop::Poller.new(statesub, frontend)
send_state_at = Time.now + (HEARTBEAT/1000.0)
whiletrue# round to msec resolution to avoid polling bursts
time_left = (send_state_at - Time.now).round(3)
time_left = 0if time_left < 0
time_left = (time_left * 1000).to_i # convert to mseccase poller.simple_wait(time_left)
when statesub
# state from peer
msg = statesub.receive
puts"received message from statesub: #{msg.to_a.inspect}"
bstar << :"peer_#{msg[0]}"# this could exit the process
bstar.peer_expiry = Time.now + 2 * (HEARTBEAT/1000.0)
when frontend
# client request
msg = frontend.receive
puts"received message from frontend: #{msg.to_a.inspect}"begin
bstar << :client_request
frontend << msg
rescueBStarState::Exception# We got a client request even though we're passive AND peer is alive.# We'll just ignore it.endend# If we timed out, send state to peer.ifTime.now >= send_state_at
statepub << bstar.state.to_s
send_state_at = Time.now + (HEARTBEAT/1000.0)
endendend
## Binary Star server
#
package require TclOO
package require zmq
# Arguments can be either of:
# -p primary server, at tcp://localhost:5001
# -b backup server, at tcp://localhost:5002
if{[llength$argv] != 1 || [lindex$argv0]ni{-p -b}}{puts"Usage: bstarsrv.tcl <-p|-b>"exit1}# We send state information every this often
# If peer doesn't respond in two heartbeats, it is 'dead'
set HEARTBEAT 1000;# In msecs
# States we can be in at any point in time
set STATE(NONE)0set STATE(PRIMARY)1;# Primary, waiting for peer to connect
set STATE(BACKUP)2;# Backup, waiting for peer to connect
set STATE(ACTIVE)3;# Active - accepting connections
set STATE(PASSIVE)4;# Passive - not accepting connections
# Events, which start with the states our peer can be in
set EVENT(NONE)0set EVENT(PRIMARY)1;# HA peer is pending primary
set EVENT(BACKUP)2;# HA peer is pending backup
set EVENT(ACTIVE)3;# HA peer is active
set EVENT(PASSIVE)4;# HA peer is passive
set EVENT(REQUEST)5;# Client makes request
# Our finite state machine
oo::class create BStar {variable state event peer_expiry
constructor{}{set state NONE
set event NONE
set peer_expiry 0}destructor{}method state_machine {}{set exception 0if{$stateeq"PRIMARY"}{# Primary server is waiting for peer to connect
# Accepts CLIENT_REQUEST events in this state
if{$eventeq"BACKUP"}{puts"I: connected to backup (slave), ready as master"set state ACTIVE
}elseif{$eventeq"ACTIVE"}{puts"I: connected to backup (master), ready as slave"set state PASSIVE
}}elseif{$stateeq"BACKUP"}{# Backup server is waiting for peer to connect
# Rejects CLIENT_REQUEST events in this state
if{$eventeq"ACTIVE"}{puts"I: connected to primary (master), ready as slave"set state PASSIVE
}elseif{$eventeq"REQUEST"}{set exception 1}}elseif{$stateeq"ACTIVE"}{# Server is active
# Accepts CLIENT_REQUEST events in this state
if{$eventeq"ACTIVE"}{# Two masters would mean split-brain
puts"E: fatal error - dual masters, aborting"set exception 1}}elseif{$stateeq"PASSIVE"}{# Server is passive
# CLIENT_REQUEST events can trigger failover if peer looks dead
if{$eventeq"PRIMARY"}{# Peer is restarting - become active, peer will go passive
puts"I: primary (slave) is restarting, ready as master"set state ACTIVE
}elseif{$eventeq"BACKUP"}{# Peer is restarting - become active, peer will go passive
puts"I: backup (slave) is restarting, ready as master"set state ACTIVE
}elseif{$eventeq"PASSIVE"}{# Two passives would mean cluster would be non-responsive
puts"E: fatal error - dual slaves, aborting"set exception 1}elseif{$eventeq"REQUEST"}{# Peer becomes master if timeout has passed
# It's the client request that triggers the failover
if{$peer_expiry <= 0}{error"peer_expiry must be > 0"}if{[clock milliseconds] >= $peer_expiry}{# If peer is dead, switch to the active state
puts"I: failover successful, ready as master"set state ACTIVE
}else{# If peer is alive, reject connections
set exception 1}}}return$exception}method set_state {istate}{set state $istate}method set_event {ievent}{set event $ievent}method state {}{return$state}method update_peer_expiry {}{set peer_expiry [expr{[clock milliseconds] + 2 * $::HEARTBEAT}]}}zmq context context
zmq socket statepub context PUB
zmq socket statesub context SUB
statesub setsockopt SUBSCRIBE ""zmq socket frontend context ROUTER
set fsm [BStar new]if{[lindex$argv0]eq"-p"}{puts"I: Primary master, waiting for backup (slave)"frontend bind "tcp://*:5001"statepub bind "tcp://*:5003"statesub connect "tcp://localhost:5004"$fsmset_state PRIMARY
}elseif{[lindex$argv0]eq"-b"}{puts"I: Backup slave, waiting for primary (master)"frontend bind "tcp://*:5002"statepub bind "tcp://*:5004"statesub connect "tcp://localhost:5003"$fsmset_state BACKUP
}# Set timer for next outgoing state message
set send_state_at [expr{[clock milliseconds] + $HEARTBEAT}]while{1}{set timeleft [expr{$send_state_at-[clock milliseconds]}]if{$timeleft < 0}{set timeleft 0}foreach rpoll [zmq poll {{frontend{POLLIN}}{statesub{POLLIN}}}$timeleft]{switch -exact -- [lindex$rpoll0]{frontend{# Have a client request
set msg [zmsg recv frontend]$fsmset_event REQUEST
if{[$fsmstate_machine] == 0}{zmsg send frontend $msg}}statesub{# Have state from our peer, execute as event
set state [statesub recv]$fsmset_event$stateif{[$fsmstate_machine]}{break;# Error, so exit
}$fsmupdate_peer_expiry}}}# If we timed-out, send state to peer
if{[clock milliseconds] >= $send_state_at}{statepub send [$fsmstate]set send_state_at [expr{[clock milliseconds] + $HEARTBEAT}]}}statepub close
statesub close
frontend close
context term
// Binary Star client proof-of-concept implementation. This client does no
// real work; it just demonstrates the Binary Star failover model.
#include"czmq.h"#define REQUEST_TIMEOUT 1000 // msecs
#define SETTLE_DELAY 2000 // Before failing over
intmain (void)
{
zctx_t *ctx = zctx_new ();
char *server [] = { "tcp://localhost:5001", "tcp://localhost:5002" };
uint server_nbr = 0;
printf ("I: connecting to server at %s...\n", server [server_nbr]);
void *client = zsocket_new (ctx, ZMQ_REQ);
zsocket_connect (client, server [server_nbr]);
int sequence = 0;
while (!zctx_interrupted) {
// We send a request, then we work to get a reply
char request [10];
sprintf (request, "%d", ++sequence);
zstr_send (client, request);
int expect_reply = 1;
while (expect_reply) {
// Poll socket for a reply, with timeout
zmq_pollitem_t items [] = { { client, 0, ZMQ_POLLIN, 0 } };
int rc = zmq_poll (items, 1, REQUEST_TIMEOUT * ZMQ_POLL_MSEC);
if (rc == -1)
break; // Interrupted
// .split main body of client
// We use a Lazy Pirate strategy in the client. If there's no
// reply within our timeout, we close the socket and try again.
// In Binary Star, it's the client vote that decides which
// server is primary; the client must therefore try to connect
// to each server in turn:
if (items [0].revents & ZMQ_POLLIN) {
// We got a reply from the server, must match sequence
char *reply = zstr_recv (client);
if (atoi (reply) == sequence) {
printf ("I: server replied OK (%s)\n", reply);
expect_reply = 0;
sleep (1); // One request per second
}
else
printf ("E: bad reply from server: %s\n", reply);
free (reply);
}
else {
printf ("W: no response from server, failing over\n");
// Old socket is confused; close it and open a new one
zsocket_destroy (ctx, client);
server_nbr = (server_nbr + 1) % 2;
zclock_sleep (SETTLE_DELAY);
printf ("I: connecting to server at %s...\n",
server [server_nbr]);
client = zsocket_new (ctx, ZMQ_REQ);
zsocket_connect (client, server [server_nbr]);
// Send request again, on new socket
zstr_send (client, request);
}
}
}
zctx_destroy (&ctx);
return0;
}
bstarcli: Binary Star client in C++
// Binary Star client proof-of-concept implementation. This client does no
// real work; it just demonstrates the Binary Star failover model.
#include"zmsg.hpp"#define REQUEST_TIMEOUT 1000 // msecs
#define SETTLE_DELAY 2000 // Before failing over
#define ZMQ_POLL_MSEC 1 // zmq_poll delay
intmain(void) {
zmq::context_t context(1);
char *server [] = {"tcp://localhost:5001", "tcp://localhost:5002"};
uint server_nbr = 0;
std::cout << "I: connecting to " << server[server_nbr] << "..." << std::endl;
zmq::socket_t *client = new zmq::socket_t(context, ZMQ_REQ);
// Configure socket to not wait at close time
int linger = 0;
client->setsockopt (ZMQ_LINGER, &linger, sizeof (linger));
client->connect(server[server_nbr]);
int sequence = 0;
while(true) {
// We send a request, then we work to get a reply
std::string request_string = std::to_string(++sequence);
s_send(*client, request_string);
int expect_reply = 1;
while(expect_reply) {
zmq::pollitem_t items[] = {{*client, 0, ZMQ_POLLIN, 0}};
try {
zmq::poll(items, 1, REQUEST_TIMEOUT * ZMQ_POLL_MSEC);
} catch (std::exception &e) {
break; // Interrupted
}
// .split main body of client
// We use a Lazy Pirate strategy in the client. If there's no
// reply within our timeout, we close the socket and try again.
// In Binary Star, it's the client vote that decides which
// server is primary; the client must therefore try to connect
// to each server in turn:
if (items[0].revents & ZMQ_POLLIN) {
// We got a reply from the server, must match sequence
std::string reply = s_recv(*client);
if (std::stoi(reply) == sequence) {
std::cout << "I: server replied OK (" << reply << ")" << std::endl;
expect_reply = 0;
s_sleep(1000); // One request per second
} else {
std::cout << "E: bad reply from server: " << reply << std::endl;
}
} else {
std::cout << "W: no response from server, failing over" << std::endl;
// Old socket is confused; close it and open a new one
delete client;
server_nbr = (server_nbr + 1) % 2;
s_sleep(SETTLE_DELAY);
std::cout << "I: connecting to " << server[server_nbr] << "..." << std::endl;
client = new zmq::socket_t(context, ZMQ_REQ);
linger = 0;
client->setsockopt(ZMQ_LINGER, &linger, sizeof(linger));
client->connect(server[server_nbr]);
// Send request again, on new socket
s_send(*client, request_string);
}
}
}
return0;
}
package ;
importhaxe.Stack;
importneko.Lib;
importneko.Sys;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQPoller;
importorg.zeromq.ZMsg;
importorg.zeromq.ZMQException;
/**
* Binary Star Client
* @author Richard J Smith
*
* @see http://zguide.zeromq.org/page:all#Binary-Star-Implementation
*/class BStarCli
{
privatestaticinlinevar REQUEST_TIMEOUT = 1000; // msecsprivatestaticinlinevar SETTLE_DELAY = 2000; // Before failing overpublicstaticfunctionmain()
{
Lib.println("** BStarCli (see: http://zguide.zeromq.org/page:all#Binary-Star-Implementation)");
var ctx = new ZContext();
var server = ["tcp://localhost:5001", "tcp://localhost:5002"];
var server_nbr = 0;
Lib.println("I: connecting to server at " + server[server_nbr]);
var client = ctx.createSocket(ZMQ_REQ);
client.connect(server[server_nbr]);
var sequence = 0;
var poller = new ZMQPoller();
poller.registerSocket(client, ZMQ.ZMQ_POLLIN());
while (!ZMQ.isInterrupted()) {
// We send a request, then we work to get a replyvar request = Std.string(++sequence);
ZMsg.newStringMsg(request).send(client);
var expectReply = true;
while (expectReply) {
// Poll socket for a reply, with timeouttry {
var res = poller.poll(REQUEST_TIMEOUT * 1000); // Convert timeout to microseconds
} catch (e:ZMQException) {
if (!ZMQ.isInterrupted()) {
trace("ZMQException #:" + e.errNo + ", str:" + e.str());
trace (Stack.toString(Stack.exceptionStack()));
} else {
Lib.println("W: interrupt received, killing client...");
}
ctx.destroy();
return;
}
if (poller.pollin(1)) {
// We got a reply from the server, must match sequencevar reply = client.recvMsg().toString();
if (reply != null && Std.parseInt(reply) == sequence) {
Lib.println("I: server replied OK (" + reply + ")");
expectReply = false;
Sys.sleep(1.0); // One request per second
} else
Lib.println("E: malformed reply from server: " + reply);
} else {
Lib.println("W: no response from server, failing over");
// Old socket is confused; close it and open a new one
ctx.destroySocket(client);
server_nbr = (server_nbr + 1) % 2;
Sys.sleep(SETTLE_DELAY / 1000);
Lib.println("I: connecting to server at " + server[server_nbr]);
client = ctx.createSocket(ZMQ_REQ);
client.connect(server[server_nbr]);
poller.unregisterAllSockets();
poller.registerSocket(client, ZMQ.ZMQ_POLLIN());
ZMsg.newStringMsg(request).send(client);
}
}
}
ctx.destroy();
}
}
bstarcli: Binary Star client in Java
packageguide;
importorg.zeromq.SocketType;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
// Binary Star client proof-of-concept implementation. This client does no
// real work; it just demonstrates the Binary Star failover model.
publicclassbstarcli
{
privatestaticfinallong REQUEST_TIMEOUT = 1000; // msecs
privatestaticfinallong SETTLE_DELAY = 2000; // Before failing over
publicstaticvoidmain(String[] argv) throws Exception
{
try (ZContext ctx = new ZContext()) {
String[] server = { "tcp://localhost:5001",
"tcp://localhost:5002" };
int serverNbr = 0;
System.out.printf("I: connecting to server at %s...\n",
server[serverNbr]);
Socket client = ctx.createSocket(SocketType.REQ);
client.connect(server[serverNbr]);
Poller poller = ctx.createPoller(1);
poller.register(client, ZMQ.Poller.POLLIN);
int sequence = 0;
while (!Thread.currentThread().isInterrupted()) {
// We send a request, then we work to get a reply
String request = String.format("%d", ++sequence);
client.send(request);
boolean expectReply = true;
while (expectReply) {
// Poll socket for a reply, with timeout
int rc = poller.poll(REQUEST_TIMEOUT);
if (rc == -1)
break; // Interrupted
// .split main body of client
// We use a Lazy Pirate strategy in the client. If there's
// no reply within our timeout, we close the socket and try
// again. In Binary Star, it's the client vote that
// decides which server is primary; the client must
// therefore try to connect to each server in turn:
if (poller.pollin(0)) {
// We got a reply from the server, must match getSequence
String reply = client.recvStr();
if (Integer.parseInt(reply) == sequence) {
System.out.printf("I: server replied OK (%s)\n", reply);
expectReply = false;
Thread.sleep(1000); // One request per second
}
else System.out.printf("E: bad reply from server: %s\n", reply);
}
else {
System.out.printf("W: no response from server, failing over\n");
// Old socket is confused; close it and open a new one
poller.unregister(client);
ctx.destroySocket(client);
serverNbr = (serverNbr + 1) % 2;
Thread.sleep(SETTLE_DELAY);
System.out.printf("I: connecting to server at %s...\n", server[serverNbr]);
client = ctx.createSocket(SocketType.REQ);
client.connect(server[serverNbr]);
poller.register(client, ZMQ.Poller.POLLIN);
// Send request again, on new socket
client.send(request);
}
}
}
}
}
}
#!/usr/bin/env ruby# vim: ft=ruby# Binary Star client proof-of-concept implementation. This client does no# real work; it just demonstrates the Binary Star failover model.require'optparse'require'cztop'REQUEST_TIMEOUT = 1000# msecsSETTLE_DELAY = 2000# before failing overSERVERS = %w[tcp://localhost:5001 tcp://localhost:5002]
server_nbr = 0puts"I: connecting to server at %s…" % SERVERS[server_nbr]
client = CZTop::Socket::REQ.new(SERVERS[server_nbr])
sequence = 0
poller = CZTop::Poller.new(client)
whiletrue
sequence += 1puts sequence
client << "#{sequence}"
expect_reply = truewhile expect_reply
# We use a Lazy Pirate strategy in the client. If there's no# reply within our timeout, we close the socket and try again.# In Binary Star, it's the client vote that decides which# server is primary; the client must therefore try to connect# to each server in turn:if poller.simple_wait(REQUEST_TIMEOUT)
reply = client.receive
# We got a reply from the server, must match sequenceif reply[0].to_i == sequence
puts"I: server replied OK (%p)" % reply[0]
expect_reply = falsesleep(1) # one request per secondelseputs"E: bad reply from server: %p" % reply
endelseputs"W: no response from server, failing over"# Old socket is confused; close it and open a new one
poller.remove_reader(client)
client.close
server_nbr = (server_nbr + 1) % 2sleep(SETTLE_DELAY/1000.0)
puts"I: connecting to server at %s…\n" % SERVERS[server_nbr]
client = CZTop::Socket::REQ.new(SERVERS[server_nbr])
poller.add_reader(client)
client << "#{sequence}"endendend
## Binary Star client
#
package require zmq
set REQUEST_TIMEOUT 1000;# msecs
set SETTLE_DELAY 2000;# Before failing over, msecs
zmq context context
set server [list"tcp://localhost:5001""tcp://localhost:5002"]set server_nbr 0puts"I: connecting to server at [lindex $server $server_nbr]..."zmq socket client context REQ
client connect [lindex$server$server_nbr]set sequence 0while{1}{# We send a request, then we work to get a reply
set request [incr sequence]client send $requestset expect_reply 1while{$expect_reply}{# Poll socket for a reply, with timeout
set rpoll_set [zmq poll {{client{POLLIN}}}$REQUEST_TIMEOUT]# If we got a reply, process it
if{[llength$rpoll_set] && "POLLIN"in[lindex$rpoll_set01]}{set reply [client recv]if{$replyeq$request}{puts"I: server replied OK ($reply)"set expect_reply 0after1000;# One request per second
}else{puts"E: malformed reply from server: $reply"}}else{puts"W: no response from server, failing over"# Old socket is confused; close it and open a new one
client close
set server_nbr [expr{($server_nbr + 1) % 2}]after$SETTLE_DELAYputs"I: connecting to server at [lindex $server $server_nbr]..."zmq socket client context REQ
client connect [lindex$server$server_nbr]# Send request again, on new socket
client send $request}}}client close
context term
You can then provoke failover by killing the primary server, and recovery by restarting the primary and killing the backup. Note how it’s the client vote that triggers failover, and recovery.
Binary star is driven by a finite state machine. Events are the peer state, so “Peer Active” means the other server has told us it’s active. “Client Request” means we’ve received a client request. “Client Vote” means we’ve received a client request AND our peer is inactive for two heartbeats.
Note that the servers use PUB-SUB sockets for state exchange. No other socket combination will work here. PUSH and DEALER block if there is no peer ready to receive a message. PAIR does not reconnect if the peer disappears and comes back. ROUTER needs the address of the peer before it can send it a message.
Binary Star is useful and generic enough to package up as a reusable reactor class. The reactor then runs and calls our code whenever it has a message to process. This is much nicer than copying/pasting the Binary Star code into each server where we want that capability.
In C, we wrap the CZMQ zloop class that we saw before. zloop lets you register handlers to react on socket and timer events. In the Binary Star reactor, we provide handlers for voters and for state changes (active to passive, and vice versa). Here is the bstar API:
// Create a new Binary Star instance, using local (bind) and
// remote (connect) endpoints to set up the server peering.
bstar_t *bstar_new (int primary, char *local, char *remote);
// Destroy a Binary Star instance
voidbstar_destroy (bstar_t **self_p);
// Return underlying zloop reactor, for timer and reader
// registration and cancelation.
zloop_t *bstar_zloop (bstar_t *self);
// Register voting reader
intbstar_voter (bstar_t *self, char *endpoint, int type,
zloop_fn handler, void *arg);
// Register main state change handlers
voidbstar_new_active (bstar_t *self, zloop_fn handler, void *arg);
voidbstar_new_passive (bstar_t *self, zloop_fn handler, void *arg);
// Start the reactor, which ends if a callback function returns -1,
// or the process received SIGINT or SIGTERM.
intbstar_start (bstar_t *self);
// bstar class - Binary Star reactor
#include"bstar.h"// States we can be in at any point in time
typedefenum {
STATE_PRIMARY = 1, // Primary, waiting for peer to connect
STATE_BACKUP = 2, // Backup, waiting for peer to connect
STATE_ACTIVE = 3, // Active - accepting connections
STATE_PASSIVE = 4// Passive - not accepting connections
} state_t;
// Events, which start with the states our peer can be in
typedefenum {
PEER_PRIMARY = 1, // HA peer is pending primary
PEER_BACKUP = 2, // HA peer is pending backup
PEER_ACTIVE = 3, // HA peer is active
PEER_PASSIVE = 4, // HA peer is passive
CLIENT_REQUEST = 5// Client makes request
} event_t;
// Structure of our class
struct _bstar_t {
zctx_t *ctx; // Our private context
zloop_t *loop; // Reactor loop
void *statepub; // State publisher
void *statesub; // State subscriber
state_t state; // Current state
event_t event; // Current event
int64_t peer_expiry; // When peer is considered 'dead'
zloop_fn *voter_fn; // Voting socket handler
void *voter_arg; // Arguments for voting handler
zloop_fn *active_fn; // Call when become active
void *active_arg; // Arguments for handler
zloop_fn *passive_fn; // Call when become passive
void *passive_arg; // Arguments for handler
};
// The finite-state machine is the same as in the proof-of-concept server.
// To understand this reactor in detail, first read the CZMQ zloop class.
// .skip
// We send state information every this often
// If peer doesn't respond in two heartbeats, it is 'dead'
#define BSTAR_HEARTBEAT 1000 // In msecs
// Binary Star finite state machine (applies event to state)
// Returns -1 if there was an exception, 0 if event was valid.
staticints_execute_fsm (bstar_t *self)
{
int rc = 0;
// Primary server is waiting for peer to connect
// Accepts CLIENT_REQUEST events in this state
if (self->state == STATE_PRIMARY) {
if (self->event == PEER_BACKUP) {
zclock_log ("I: connected to backup (passive), ready as active");
self->state = STATE_ACTIVE;
if (self->active_fn)
(self->active_fn) (self->loop, NULL, self->active_arg);
}
elseif (self->event == PEER_ACTIVE) {
zclock_log ("I: connected to backup (active), ready as passive");
self->state = STATE_PASSIVE;
if (self->passive_fn)
(self->passive_fn) (self->loop, NULL, self->passive_arg);
}
elseif (self->event == CLIENT_REQUEST) {
// Allow client requests to turn us into the active if we've
// waited sufficiently long to believe the backup is not
// currently acting as active (i.e., after a failover)
assert (self->peer_expiry > 0);
if (zclock_time () >= self->peer_expiry) {
zclock_log ("I: request from client, ready as active");
self->state = STATE_ACTIVE;
if (self->active_fn)
(self->active_fn) (self->loop, NULL, self->active_arg);
} else// Don't respond to clients yet - it's possible we're
// performing a failback and the backup is currently active
rc = -1;
}
}
else// Backup server is waiting for peer to connect
// Rejects CLIENT_REQUEST events in this state
if (self->state == STATE_BACKUP) {
if (self->event == PEER_ACTIVE) {
zclock_log ("I: connected to primary (active), ready as passive");
self->state = STATE_PASSIVE;
if (self->passive_fn)
(self->passive_fn) (self->loop, NULL, self->passive_arg);
}
elseif (self->event == CLIENT_REQUEST)
rc = -1;
}
else// Server is active
// Accepts CLIENT_REQUEST events in this state
// The only way out of ACTIVE is death
if (self->state == STATE_ACTIVE) {
if (self->event == PEER_ACTIVE) {
// Two actives would mean split-brain
zclock_log ("E: fatal error - dual actives, aborting");
rc = -1;
}
}
else// Server is passive
// CLIENT_REQUEST events can trigger failover if peer looks dead
if (self->state == STATE_PASSIVE) {
if (self->event == PEER_PRIMARY) {
// Peer is restarting - become active, peer will go passive
zclock_log ("I: primary (passive) is restarting, ready as active");
self->state = STATE_ACTIVE;
}
elseif (self->event == PEER_BACKUP) {
// Peer is restarting - become active, peer will go passive
zclock_log ("I: backup (passive) is restarting, ready as active");
self->state = STATE_ACTIVE;
}
elseif (self->event == PEER_PASSIVE) {
// Two passives would mean cluster would be non-responsive
zclock_log ("E: fatal error - dual passives, aborting");
rc = -1;
}
elseif (self->event == CLIENT_REQUEST) {
// Peer becomes active if timeout has passed
// It's the client request that triggers the failover
assert (self->peer_expiry > 0);
if (zclock_time () >= self->peer_expiry) {
// If peer is dead, switch to the active state
zclock_log ("I: failover successful, ready as active");
self->state = STATE_ACTIVE;
}
else// If peer is alive, reject connections
rc = -1;
}
// Call state change handler if necessary
if (self->state == STATE_ACTIVE && self->active_fn)
(self->active_fn) (self->loop, NULL, self->active_arg);
}
return rc;
}
staticvoids_update_peer_expiry (bstar_t *self)
{
self->peer_expiry = zclock_time () + 2 * BSTAR_HEARTBEAT;
}
// Reactor event handlers...
// Publish our state to peer
ints_send_state (zloop_t *loop, int timer_id, void *arg)
{
bstar_t *self = (bstar_t *) arg;
zstr_sendf (self->statepub, "%d", self->state);
return0;
}
// Receive state from peer, execute finite state machine
ints_recv_state (zloop_t *loop, zmq_pollitem_t *poller, void *arg)
{
bstar_t *self = (bstar_t *) arg;
char *state = zstr_recv (poller->socket);
if (state) {
self->event = atoi (state);
s_update_peer_expiry (self);
free (state);
}
return s_execute_fsm (self);
}
// Application wants to speak to us, see if it's possible
ints_voter_ready (zloop_t *loop, zmq_pollitem_t *poller, void *arg)
{
bstar_t *self = (bstar_t *) arg;
// If server can accept input now, call appl handler
self->event = CLIENT_REQUEST;
if (s_execute_fsm (self) == 0)
(self->voter_fn) (self->loop, poller, self->voter_arg);
else {
// Destroy waiting message, no-one to read it
zmsg_t *msg = zmsg_recv (poller->socket);
zmsg_destroy (&msg);
}
return0;
}
// .until
// .split constructor
// This is the constructor for our {{bstar}} class. We have to tell it
// whether we're primary or backup server, as well as our local and
// remote endpoints to bind and connect to:
bstar_t *
bstar_new (int primary, char *local, char *remote)
{
bstar_t
*self;
self = (bstar_t *) zmalloc (sizeof (bstar_t));
// Initialize the Binary Star
self->ctx = zctx_new ();
self->loop = zloop_new ();
self->state = primary? STATE_PRIMARY: STATE_BACKUP;
// Create publisher for state going to peer
self->statepub = zsocket_new (self->ctx, ZMQ_PUB);
zsocket_bind (self->statepub, local);
// Create subscriber for state coming from peer
self->statesub = zsocket_new (self->ctx, ZMQ_SUB);
zsocket_set_subscribe (self->statesub, "");
zsocket_connect (self->statesub, remote);
// Set-up basic reactor events
zloop_timer (self->loop, BSTAR_HEARTBEAT, 0, s_send_state, self);
zmq_pollitem_t poller = { self->statesub, 0, ZMQ_POLLIN };
zloop_poller (self->loop, &poller, s_recv_state, self);
return self;
}
// .split destructor
// The destructor shuts down the bstar reactor:
voidbstar_destroy (bstar_t **self_p)
{
assert (self_p);
if (*self_p) {
bstar_t *self = *self_p;
zloop_destroy (&self->loop);
zctx_destroy (&self->ctx);
free (self);
*self_p = NULL;
}
}
// .split zloop method
// This method returns the underlying zloop reactor, so we can add
// additional timers and readers:
zloop_t *
bstar_zloop (bstar_t *self)
{
return self->loop;
}
// .split voter method
// This method registers a client voter socket. Messages received
// on this socket provide the CLIENT_REQUEST events for the Binary Star
// FSM and are passed to the provided application handler. We require
// exactly one voter per {{bstar}} instance:
intbstar_voter (bstar_t *self, char *endpoint, int type, zloop_fn handler,
void *arg)
{
// Hold actual handler+arg so we can call this later
void *socket = zsocket_new (self->ctx, type);
zsocket_bind (socket, endpoint);
assert (!self->voter_fn);
self->voter_fn = handler;
self->voter_arg = arg;
zmq_pollitem_t poller = { socket, 0, ZMQ_POLLIN };
return zloop_poller (self->loop, &poller, s_voter_ready, self);
}
// .split register state-change handlers
// Register handlers to be called each time there's a state change:
voidbstar_new_active (bstar_t *self, zloop_fn handler, void *arg)
{
assert (!self->active_fn);
self->active_fn = handler;
self->active_arg = arg;
}
voidbstar_new_passive (bstar_t *self, zloop_fn handler, void *arg)
{
assert (!self->passive_fn);
self->passive_fn = handler;
self->passive_arg = arg;
}
// .split enable/disable tracing
// Enable/disable verbose tracing, for debugging:
voidbstar_set_verbose (bstar_t *self, bool verbose)
{
zloop_set_verbose (self->loop, verbose);
}
// .split start the reactor
// Finally, start the configured reactor. It will end if any handler
// returns -1 to the reactor, or if the process receives SIGINT or SIGTERM:
intbstar_start (bstar_t *self)
{
assert (self->voter_fn);
s_update_peer_expiry (self);
return zloop_start (self->loop);
}
bstar: Binary Star core class in C++
#include<zmqpp/zmqpp.hpp>// We send state information every this often
// If peer doesn't respond in two heartbeats, it is 'dead'
#define BSTAR_HEARTBEAT 1000 // In msecs
// States we can be in at any point in time
typedefenum {
STATE_NOTSET = 0, // Before we start, or if the state is invalid
STATE_PRIMARY = 1, // Primary, waiting for peer to connect
STATE_BACKUP = 2, // Backup, waiting for peer to connect
STATE_ACTIVE = 3, // Active - accepting connections
STATE_PASSIVE = 4// Passive - not accepting connections
} state_t;
// Events, which start with the states our peer can be in
typedefenum {
EVENT_NOTSET = 0, // Before we start, or if the event is invalid
PEER_PRIMARY = 1, // HA peer is pending primary
PEER_BACKUP = 2, // HA peer is pending backup
PEER_ACTIVE = 3, // HA peer is active
PEER_PASSIVE = 4, // HA peer is passive
CLIENT_REQUEST = 5// Client makes request
} event_t;
classbstar_t {
public:
bstar_t() = delete;
bstar_t(bool primary, std::string local, std::string remote) {
this->ctx = new zmqpp::context_t();
this->loop = new zmqpp::loop();
this->m_state = primary ? STATE_PRIMARY : STATE_BACKUP;
// Create publisher for state going to peer
this->statepub = new zmqpp::socket_t(*this->ctx, zmqpp::socket_type::pub);
this->statepub->bind(local);
// Create subscriber for state coming from peer
this->statesub = new zmqpp::socket_t(*this->ctx, zmqpp::socket_type::sub);
this->statesub->subscribe("");
this->statesub->connect(remote);
// Set-up basic reactor events
this->loop->add(*this->statesub, std::bind(&bstar_t::s_recv_state, this));
this->loop->add(std::chrono::milliseconds(BSTAR_HEARTBEAT), 0, std::bind(&bstar_t::s_send_state, this));
}
~bstar_t() {
if (statepub) delete statepub;
if (statesub) delete statesub;
if (loop) delete loop;
if (ctx) delete ctx;
}
bstar_t(const bstar_t &) = delete;
bstar_t &operator=(const bstar_t &) = delete;
bstar_t(bstar_t &&src) = default;
bstar_t &operator=(bstar_t &&src) = default;
voidset_state(state_t state) {
m_state = state;
}
state_t get_state() {
return m_state;
}
voidset_peer_expiry(int64_t expiry) {
m_peer_expiry = expiry;
}
voidset_voter(std::function<void(zmqpp::loop*, zmqpp::socket_t *socket, void* args)> fn, void *arg) {
voter_fn = fn;
voter_arg = arg;
}
voidset_new_active(std::function<void(zmqpp::loop*, zmqpp::socket_t *socket, void* args)> fn, void *arg) {
active_fn = fn;
active_arg = arg;
}
voidset_new_passive(std::function<void(zmqpp::loop*, zmqpp::socket_t *socket, void* args)> fn, void *arg) {
passive_fn = fn;
passive_arg = arg;
}
zmqpp::loop *get_loop() {
return loop;
}
// Binary Star finite state machine (applies event to state)
// Returns true if there was an exception, false if event was valid.
boolexecute_fsm(event_t event) {
m_event = event;
bool exception = false;
// These are the PRIMARY and BACKUP states; we're waiting to become
// ACTIVE or PASSIVE depending on events we get from our peer:
if (m_state == STATE_PRIMARY) {
if (m_event == PEER_BACKUP) {
std::cout << "I: connected to backup (passive), ready active" << std::endl;
m_state = STATE_ACTIVE;
if (active_fn) {
active_fn(loop, nullptr, active_arg);
}
} elseif (m_event == PEER_ACTIVE) {
std::cout << "I: connected to backup (active), ready passive" << std::endl;
m_state = STATE_PASSIVE;
if (passive_fn) {
passive_fn(loop, nullptr, passive_arg);
}
}
// Accept client connections
} elseif (m_state == STATE_BACKUP) {
if (m_event == PEER_ACTIVE) {
std::cout << "I: connected to primary (active), ready passive" << std::endl;
m_state = STATE_PASSIVE;
if (passive_fn) {
passive_fn(loop, nullptr, passive_arg);
}
} elseif (m_event == CLIENT_REQUEST) {
// Reject client connections when acting as backup
exception = true;
}
// .split active and passive states
// These are the ACTIVE and PASSIVE states:
} elseif (m_state == STATE_ACTIVE) {
if (m_event == PEER_ACTIVE) {
std::cout << "E: fatal error - dual actives, aborting" << std::endl;
exception = true;
}
// Server is passive
// CLIENT_REQUEST events can trigger failover if peer looks dead
} elseif (m_state == STATE_PASSIVE) {
if (m_event == PEER_PRIMARY) {
// Peer is restarting - become active, peer will go passive
std::cout << "I: primary (passive) is restarting, ready active" << std::endl;
m_state = STATE_ACTIVE;
} elseif (m_event == PEER_BACKUP) {
// Peer is restarting - become active, peer will go passive
std::cout << "I: backup (passive) is restarting, ready active" << std::endl;
m_state = STATE_ACTIVE;
} elseif (m_event == PEER_PASSIVE) {
// Two passives would mean cluster would be non-responsive
std::cout << "E: fatal error - dual passives, aborting" << std::endl;
exception = true;
} elseif (m_event == CLIENT_REQUEST) {
// Peer becomes active if timeout has passed
// It's the client request that triggers the failover
assert(m_peer_expiry > 0);
auto now = std::chrono::system_clock::now();
if (std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()).count() >= m_peer_expiry) {
std::cout << "I: failover successful, ready active" << std::endl;
m_state = STATE_ACTIVE;
} else {
// If peer is alive, reject connections
exception = true;
}
}
// Call state change handler if necessary
if (m_state == STATE_ACTIVE && active_fn) {
active_fn(loop, nullptr, active_arg);
}
}
return exception;
}
voidupdate_peer_expiry() {
m_peer_expiry = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count() + 2 * BSTAR_HEARTBEAT;
}
staticbools_send_state(bstar_t *self) {
zmqpp::message_t msg;
msg << static_cast<int>(self->m_state);
std::cout << "I: publishing state " << self->m_state << std::endl;
self->statepub->send(msg);
returntrue;
}
staticbools_recv_state(bstar_t *self) {
zmqpp::message_t msg;
bool rc = self->statesub->receive(msg);
if (rc) {
int state;
msg >> state;
self->m_event = static_cast<event_t>(state);
self->update_peer_expiry();
}
bool exception = self->execute_fsm(self->m_event);
return !exception;
}
staticbools_voter_ready(bstar_t *self, zmqpp::socket_t *socket) {
self->m_event = CLIENT_REQUEST;
if (self->execute_fsm(self->m_event) == false) {
if (self->voter_fn) {
self->voter_fn(self->loop, socket, self->voter_arg);
}
} else {
// Destroy waiting message, no-one to read it
zmqpp::message_t msg;
socket->receive(msg);
}
returntrue;
}
// .split voter method
// This method registers a client voter socket. Messages received
// on this socket provide the CLIENT_REQUEST events for the Binary Star
// FSM and are passed to the provided application handler. We require
// exactly one voter per {{bstar}} instance:
intregister_voter(std::string endpoint, zmqpp::socket_type type, std::function<void(zmqpp::loop*, zmqpp::socket_t *socket, void* args)> fn, void *arg) {
zmqpp::socket_t *socket = new zmqpp::socket_t(*ctx, type);
socket->bind(endpoint);
assert(!voter_fn);
voter_fn = fn;
voter_arg = arg;
loop->add(*socket, std::bind(&bstar_t::s_voter_ready, this, socket));
return0;
}
intstart() {
assert(voter_fn);
update_peer_expiry();
loop->start();
return0;
}
private:
zmqpp::context_t *ctx; // Our context
zmqpp::loop *loop; // Reactor loop
zmqpp::socket_t *statepub; // State publisher
zmqpp::socket_t *statesub; // State subscriber
state_t m_state; // Current state
event_t m_event; // Current event
int64_t m_peer_expiry; // When peer is considered 'dead', milliseconds
std::function<void(zmqpp::loop*, zmqpp::socket_t *socket, void* args)> voter_fn; // Voting socket handler
void *voter_arg; // Arguments for voting handler
std::function<void(zmqpp::loop*, zmqpp::socket_t *socket, void* args)> active_fn; // Call when become active
void *active_arg; // Arguments for handler
std::function<void(zmqpp::loop*, zmqpp::socket_t *socket, void* args)> passive_fn; // Call when become passive
void *passive_arg; // Arguments for handler
};
package ;
importhaxe.io.Bytes;
importhaxe.Stack;
importneko.Sys;
importneko.Lib;
importorg.zeromq.ZContext;
importorg.zeromq.ZLoop;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQPoller;
importorg.zeromq.ZMQException;
importorg.zeromq.ZMQSocket;
importorg.zeromq.ZMsg;
importorg.zeromq.ZSocket;
// States we can be in at any timeprivateenum StateT {
STATE_PRIMARY; // Primary, waiting for peer to connect
STATE_BACKUP; // Backup, waiting for peer to connect
STATE_ACTIVE; // Active - accepting connections
STATE_PASSIVE; // Passive - not accepting connections
}
privateenum EventT {
PEER_PRIMARY; // HA peer is pending primary
PEER_BACKUP; // HA peer is pending backup
PEER_ACTIVE; // HA peer is active
PEER_PASSIVE; // HA peer is passive
CLIENT_REQUEST; // Client makes request
}
/**
* Shortcut typedef for method signature of BStar reactor handler functions
*/typedef HandlerFunctionType = ZLoop->ZMQSocket->Dynamic->Int;
/**
* Binary Star Reactor
*/class BStar
{
/** We send state information every this often
* If peer doesn't respond in two heartbeats, it is 'dead'
*/privatestaticinlinevar BSTAR_HEARTBEAT = 100;
/** Our context */privatevar ctx:ZContext;
/** Reactor loop */publicvar loop(default, null):ZLoop;
/** State publisher socket */privatevar statePub:ZMQSocket;
/** State subscriber socket */privatevar stateSub:ZMQSocket;
/** Current state */publicvar state(default, null):StateT;
/** Current event */publicvar event(default, null):EventT;
/** When peer is considered 'dead' */publicvar peerExpiry:Float;
/** Voting socket handler */privatevar voterFn:HandlerFunctionType;
/** Arguments for voting handler */privatevar voterArgs:Dynamic;
/** Master socket handler, called when become Master */privatevar masterFn:HandlerFunctionType;
/** Arguments for Master handler */privatevar masterArgs:Dynamic;
/** Slave socket handler, called when become Slave */privatevar slaveFn:HandlerFunctionType;
/** Arguments for slave handler */privatevar slaveArgs:Dynamic;
/** Print activity to stdout */publicvar verbose:Bool;
/** Logger function used in verbose mode */privatevar log:Dynamic->Void;
/**
* BStar Constructor
* @param isPrimary True if this instance is the primary instance, else false if slave
* @param local Network address to bind the statePub socket of this instance to
* @param remote Network address to connect the stateSub socket of this instance to
* @param ?verbose True to generate logging info
* @param ?logger Logger function
*/publicfunctionnew(isPrimary:Bool, local:String, remote:String, ?verbose:Bool = false, ?logger:Dynamic->Void)
{
// Initialise the binary star server
ctx = new ZContext();
loop = new ZLoop(logger);
loop.verbose = verbose;
state = { if (isPrimary) STATE_PRIMARY; else STATE_BACKUP; };
// Create publisher for state going to peer
statePub = ctx.createSocket(ZMQ_PUB);
statePub.bind(local);
// Create subscriber for state coming from peer
stateSub = ctx.createSocket(ZMQ_SUB);
stateSub.setsockopt(ZMQ_SUBSCRIBE, Bytes.ofString(""));
stateSub.connect(remote);
// Set up basic reactor events
loop.registerTimer(BSTAR_HEARTBEAT, 0, sendState);
var item = { socket: stateSub, event: ZMQ.ZMQ_POLLIN() };
loop.registerPoller(item, recvState);
this.verbose = verbose;
if (logger != null)
log = logger;
else
log = Lib.println;
}
/**
* Destructor
* Cleans up internal ZLoop reactor object and ZContext objects.
*/publicfunctiondestroy() {
if (loop != null)
loop.destroy();
if (ctx != null)
ctx.destroy();
}
/**
* Create socket, bind to local endpoint, and register as reader for
* voting. The socket will only be available if the Binary Star state
* machine allows it. Input on the socket will act as a "vote" in the
* Binary Star scheme. We require exactly one voter per bstar instance.
*
* @param endpoint Endpoint address
* @param type Socket Type to bind to endpoint
* @param handler Voter Handler method
* @param args Optional args to pass to Voter Handfler method when called.
* @return
*/publicfunctionsetVoter(endpoint:String, type:SocketType, handler:HandlerFunctionType, ?args:Dynamic):Bool {
// Hold actual handler + arg so we can call this latervar socket = ctx.createSocket(type);
socket.bind(endpoint);
voterFn = handler;
voterArgs = args;
return loop.registerPoller( { socket:socket, event:ZMQ.ZMQ_POLLIN() }, voterReady);
}
/**
* Sets handler method called when instance becomes Master
* @param handler
* @param ?args
*/publicfunctionsetMaster(handler:HandlerFunctionType, ?args:Dynamic) {
if (masterFn == null) {
masterFn = handler;
masterArgs = args;
}
}
/**
* Sets handler method called when instance becomes Slave
* @param handler
* @param ?args
*/publicfunctionsetSlave(handler:HandlerFunctionType, ?args:Dynamic) {
if (slaveFn == null) {
slaveFn = handler;
slaveArgs = args;
}
}
/**
* Executes finite state machine (apply event to this state)
* Returns true if there was an exception
* @return
*/publicfunctionstateMachine():Bool
{
var exception = false;
switch (state) {
case STATE_PRIMARY:
// Primary server is waiting for peer to connect// Accepts CLIENT_REQUEST events in this stateswitch (event) {
case PEER_BACKUP:
if (verbose)
log("I: connected to backup (slave), ready as master");
state = STATE_ACTIVE;
if (masterFn != null)
masterFn(loop, null, masterArgs);
case PEER_ACTIVE:
if (verbose)
log("I: connected to backup (master), ready as slave");
state = STATE_PASSIVE;
if (slaveFn != null)
slaveFn(loop, null, slaveArgs);
case CLIENT_REQUEST:
if (verbose)
log("I: request from client, ready as master");
state = STATE_ACTIVE;
if (masterFn != null)
masterFn(loop, null, masterArgs);
default:
}
case STATE_BACKUP:
// Backup server is waiting for peer to connect// Rejects CLIENT_REQUEST events in this stateswitch (event) {
case PEER_ACTIVE:
if (verbose)
log("I: connected to primary (master), ready as slave");
state = STATE_PASSIVE;
if (slaveFn != null)
slaveFn(loop, null, slaveArgs);
case CLIENT_REQUEST:
exception = true;
default:
}
case STATE_ACTIVE:
// Server is active// Accepts CLIENT_REQUEST events in this stateswitch (event) {
case PEER_ACTIVE:
// Two masters would mean split-brain
log("E: fatal error - dual masters, aborting");
exception = true;
default:
}
case STATE_PASSIVE:
// Server is passive// CLIENT_REQUEST events can trigger failover if peer looks deadswitch (event) {
case PEER_PRIMARY:
// Peer is restarting - I become active, peer will go passiveif (verbose)
log("I: primary (slave) is restarting, ready as master");
state = STATE_ACTIVE;
case PEER_BACKUP:
// Peer is restarting - become active, peer will go passiveif (verbose)
log("I: backup (slave) is restarting, ready as master");
state = STATE_ACTIVE;
case PEER_PASSIVE:
// Two passives would mean cluster would be non-responsive
log("E: fatal error - dual slaves, aborting");
exception = true;
case CLIENT_REQUEST:
// Peer becomes master if timeout as passed// It's the client request that triggers the failoverif (Date.now().getTime() >= peerExpiry) {
// If peer is dead, switch to the active stateif (verbose)
log("I: failover successful, ready as master");
state = STATE_ACTIVE;
} else {
if (verbose)
log("I: peer is active, so ignore connection");
exception = true;
}
default:
}
}
return exception;
}
/**
* Reactor event handler
* Publish our state to peer
* @param loop
* @param socket
* @param arg
* @return
*/publicfunctionsendState(loop:ZLoop, socket:ZMQSocket):Int {
statePub.sendMsg(Bytes.ofString(Std.string(Type.enumIndex(state))));
return0;
}
/**
* Reactor event handler
* Receive state from peer, execute finite state machine.
* @param loop
* @param socket
* @return
*/publicfunctionrecvState(loop:ZLoop, socket:ZMQSocket):Int {
var message = stateSub.recvMsg().toString();
event = Type.createEnumIndex(EventT, Std.parseInt(message));
peerExpiry = Date.now().getTime() + (2 * BSTAR_HEARTBEAT);
return {
if (stateMachine())
-1; // Error, so exitelse0;
};
}
/**
* Application wants to speak to us, see if it's possible
* @param loop
* @param socket
* @return
*/publicfunctionvoterReady(loop:ZLoop, socket:ZMQSocket):Int {
// If server can accept input now, call application handler
event = CLIENT_REQUEST;
if (stateMachine()) {
// Destroy waiting message, no-one to read itvar msg = socket.recvMsg();
} else {
if (verbose)
log("I: CLIENT REQUEST");
voterFn(loop, socket, voterArgs);
}
return0;
}
/**
* Start the reactor, ends if a callback function returns -1, or the
* process receives SIGINT or SIGTERM
* @return 0 if interrupted or invalid, -1 if cancelled by handler
*/publicfunctionstart():Int {
if (voterFn != null && loop != null)
return loop.start();
elsereturn0;
}
}
bstar: Binary Star core class in Java
packageguide;
importorg.zeromq.*;
importorg.zeromq.ZLoop.IZLoopHandler;
importorg.zeromq.ZMQ.PollItem;
importorg.zeromq.ZMQ.Socket;
// bstar class - Binary Star reactor
publicclassbstar
{
// States we can be in at any point in time
enum State
{
STATE_PRIMARY, // Primary, waiting for peer to connect
STATE_BACKUP, // Backup, waiting for peer to connect
STATE_ACTIVE, // Active - accepting connections
STATE_PASSIVE // Passive - not accepting connections
}
// Events, which start with the states our peer can be in
enum Event
{
PEER_PRIMARY, // HA peer is pending primary
PEER_BACKUP, // HA peer is pending backup
PEER_ACTIVE, // HA peer is active
PEER_PASSIVE, // HA peer is passive
CLIENT_REQUEST // Client makes request
}
private ZContext ctx; // Our private context
private ZLoop loop; // Reactor loop
private Socket statepub; // State publisher
private Socket statesub; // State subscriber
private State state; // Current state
private Event event; // Current event
privatelong peerExpiry; // When peer is considered 'dead'
private ZLoop.IZLoopHandler voterFn; // Voting socket handler
private Object voterArg; // Arguments for voting handler
private ZLoop.IZLoopHandler activeFn; // Call when become active
private Object activeArg; // Arguments for handler
private ZLoop.IZLoopHandler passiveFn; // Call when become passive
private Object passiveArg; // Arguments for handler
// The finite-state machine is the same as in the proof-of-concept server.
// To understand this reactor in detail, first read the ZLoop class.
// .skip
// We send state information this often
// If peer doesn't respond in two heartbeats, it is 'dead'
privatefinalstaticint BSTAR_HEARTBEAT = 1000; // In msecs
// Binary Star finite state machine (applies event to state)
// Returns false if there was an exception, true if event was valid.
privatebooleanexecute()
{
boolean rc = true;
// Primary server is waiting for peer to connect
// Accepts CLIENT_REQUEST events in this state
if (state == State.STATE_PRIMARY) {
if (event == Event.PEER_BACKUP) {
System.out.printf("I: connected to backup (passive), ready active\n");
state = State.STATE_ACTIVE;
if (activeFn != null)
activeFn.handle(loop, null, activeArg);
}
elseif (event == Event.PEER_ACTIVE) {
System.out.printf("I: connected to backup (active), ready passive\n");
state = State.STATE_PASSIVE;
if (passiveFn != null)
passiveFn.handle(loop, null, passiveArg);
}
elseif (event == Event.CLIENT_REQUEST) {
// Allow client requests to turn us into the active if we've
// waited sufficiently long to believe the backup is not
// currently acting as active (i.e., after a failover)
assert (peerExpiry > 0);
if (System.currentTimeMillis() >= peerExpiry) {
System.out.printf("I: request from client, ready as active\n");
state = State.STATE_ACTIVE;
if (activeFn != null)
activeFn.handle(loop, null, activeArg);
}
else// Don't respond to clients yet - it's possible we're
// performing a failback and the backup is currently active
rc = false;
}
}
elseif (state == State.STATE_BACKUP) {
if (event == Event.PEER_ACTIVE) {
System.out.printf("I: connected to primary (active), ready passive\n");
state = State.STATE_PASSIVE;
if (passiveFn != null)
passiveFn.handle(loop, null, passiveArg);
}
else// Reject client connections when acting as backup
if (event == Event.CLIENT_REQUEST)
rc = false;
}
else// .split active and passive states
// These are the ACTIVE and PASSIVE states:
if (state == State.STATE_ACTIVE) {
if (event == Event.PEER_ACTIVE) {
// Two actives would mean split-brain
System.out.printf("E: fatal error - dual actives, aborting\n");
rc = false;
}
}
else// Server is passive
// CLIENT_REQUEST events can trigger failover if peer looks dead
if (state == State.STATE_PASSIVE) {
if (event == Event.PEER_PRIMARY) {
// Peer is restarting - become active, peer will go passive
System.out.printf("I: primary (passive) is restarting, ready active\n");
state = State.STATE_ACTIVE;
}
elseif (event == Event.PEER_BACKUP) {
// Peer is restarting - become active, peer will go passive
System.out.printf("I: backup (passive) is restarting, ready active\n");
state = State.STATE_ACTIVE;
}
elseif (event == Event.PEER_PASSIVE) {
// Two passives would mean cluster would be non-responsive
System.out.printf("E: fatal error - dual passives, aborting\n");
rc = false;
}
elseif (event == Event.CLIENT_REQUEST) {
// Peer becomes active if timeout has passed
// It's the client request that triggers the failover
assert (peerExpiry > 0);
if (System.currentTimeMillis() >= peerExpiry) {
// If peer is dead, switch to the active state
System.out.printf("I: failover successful, ready active\n");
state = State.STATE_ACTIVE;
}
else// If peer is alive, reject connections
rc = false;
// Call state change handler if necessary
if (state == State.STATE_ACTIVE && activeFn != null)
activeFn.handle(loop, null, activeArg);
}
}
return rc;
}
privatevoidupdatePeerExpiry()
{
peerExpiry = System.currentTimeMillis() + 2 * BSTAR_HEARTBEAT;
}
// Reactor event handlers...
// Publish our state to peer
privatestatic IZLoopHandler SendState = new IZLoopHandler()
{
@Overridepublicinthandle(ZLoop loop, PollItem item, Object arg)
{
bstar self = (bstar) arg;
self.statepub.send(String.format("%d", self.state.ordinal()));
return 0;
}
};
// Receive state from peer, execute finite state machine
privatestatic IZLoopHandler RecvState = new IZLoopHandler()
{
@Overridepublicinthandle(ZLoop loop, PollItem item, Object arg)
{
bstar self = (bstar) arg;
String state = item.getSocket().recvStr();
if (state != null) {
self.event = Event.values()[Integer.parseInt(state)];
self.updatePeerExpiry();
}
return self.execute() ? 0 : -1;
}
};
// Application wants to speak to us, see if it's possible
privatestatic IZLoopHandler VoterReady = new IZLoopHandler()
{
@Overridepublicinthandle(ZLoop loop, PollItem item, Object arg)
{
bstar self = (bstar) arg;
// If server can accept input now, call appl handler
self.event = Event.CLIENT_REQUEST;
if (self.execute())
self.voterFn.handle(loop, item, self.voterArg);
else {
// Destroy waiting message, no-one to read it
ZMsg msg = ZMsg.recvMsg(item.getSocket());
msg.destroy();
}
return 0;
}
};
// .until
// .split constructor
// This is the constructor for our {{bstar}} class. We have to tell it
// whether we're primary or backup server, as well as our local and
// remote endpoints to bind and connect to:
publicbstar(boolean primary, String local, String remote)
{
// Initialize the Binary Star
ctx = new ZContext();
loop = new ZLoop(ctx);
state = primary ? State.STATE_PRIMARY : State.STATE_BACKUP;
// Create publisher for state going to peer
statepub = ctx.createSocket(SocketType.PUB);
statepub.bind(local);
// Create subscriber for state coming from peer
statesub = ctx.createSocket(SocketType.SUB);
statesub.subscribe(ZMQ.SUBSCRIPTION_ALL);
statesub.connect(remote);
// Set-up basic reactor events
loop.addTimer(BSTAR_HEARTBEAT, 0, SendState, this);
PollItem poller = new PollItem(statesub, ZMQ.Poller.POLLIN);
loop.addPoller(poller, RecvState, this);
}
// .split destructor
// The destructor shuts down the bstar reactor:
publicvoiddestroy()
{
loop.destroy();
ctx.destroy();
}
// .split zloop method
// This method returns the underlying zloop reactor, so we can add
// additional timers and readers:
public ZLoop zloop()
{
return loop;
}
// .split voter method
// This method registers a client voter socket. Messages received
// on this socket provide the CLIENT_REQUEST events for the Binary Star
// FSM and are passed to the provided application handler. We require
// exactly one voter per {{bstar}} instance:
publicintvoter(String endpoint, SocketType type, IZLoopHandler handler, Object arg)
{
// Hold actual handler+arg so we can call this later
Socket socket = ctx.createSocket(type);
socket.bind(endpoint);
voterFn = handler;
voterArg = arg;
PollItem poller = new PollItem(socket, ZMQ.Poller.POLLIN);
return loop.addPoller(poller, VoterReady, this);
}
// .split register state-change handlers
// Register handlers to be called each time there's a state change:
publicvoidnewActive(IZLoopHandler handler, Object arg)
{
activeFn = handler;
activeArg = arg;
}
publicvoidnewPassive(IZLoopHandler handler, Object arg)
{
passiveFn = handler;
passiveArg = arg;
}
// .split enable/disable tracing
// Enable/disable verbose tracing, for debugging:
publicvoidsetVerbose(boolean verbose)
{
loop.verbose(verbose);
}
// .split start the reactor
// Finally, start the configured reactor. It will end if any handler
// returns -1 to the reactor, or if the process receives Interrupt
publicintstart()
{
assert (voterFn != null);
updatePeerExpiry();
return loop.start();
}
}
"""
Binary Star server
Author: Min RK <benjaminrk@gmail.com>
"""importtimeimportzmqfromzmq.eventloop.ioloopimport IOLoop, PeriodicCallback
fromzmq.eventloop.zmqstreamimport ZMQStream
# States we can be in at any point in time
STATE_PRIMARY = 1# Primary, waiting for peer to connect
STATE_BACKUP = 2# Backup, waiting for peer to connect
STATE_ACTIVE = 3# Active - accepting connections
STATE_PASSIVE = 4# Passive - not accepting connections# Events, which start with the states our peer can be in
PEER_PRIMARY = 1# HA peer is pending primary
PEER_BACKUP = 2# HA peer is pending backup
PEER_ACTIVE = 3# HA peer is active
PEER_PASSIVE = 4# HA peer is passive
CLIENT_REQUEST = 5# Client makes request# We send state information every this often# If peer doesn't respond in two heartbeats, it is 'dead'
HEARTBEAT = 1000# In msecsclassFSMError(Exception):
"""Exception class for invalid state"""passclassBinaryStar(object):
def __init__(self, primary, local, remote):
# initialize the Binary Star
self.ctx = zmq.Context() # Our private context
self.loop = IOLoop.instance() # Reactor loop
self.state = STATE_PRIMARY if primary else STATE_BACKUP
self.event = None # Current event
self.peer_expiry = 0# When peer is considered 'dead'
self.voter_callback = None # Voting socket handler
self.master_callback = None # Call when become master
self.slave_callback = None # Call when become slave# Create publisher for state going to peer
self.statepub = self.ctx.socket(zmq.PUB)
self.statepub.bind(local)
# Create subscriber for state coming from peer
self.statesub = self.ctx.socket(zmq.SUB)
self.statesub.setsockopt_string(zmq.SUBSCRIBE, u'')
self.statesub.connect(remote)
# wrap statesub in ZMQStream for event triggers
self.statesub = ZMQStream(self.statesub, self.loop)
# setup basic reactor events
self.heartbeat = PeriodicCallback(self.send_state,
HEARTBEAT, self.loop)
self.statesub.on_recv(self.recv_state)
defupdate_peer_expiry(self):
"""Update peer expiry time to be 2 heartbeats from now."""
self.peer_expiry = time.time() + 2e-3 * HEARTBEAT
defstart(self):
self.update_peer_expiry()
self.heartbeat.start()
return self.loop.start()
defexecute_fsm(self):
"""Binary Star finite state machine (applies event to state)
returns True if connections should be accepted, False otherwise.
"""
accept = True
if self.state == STATE_PRIMARY:
# Primary server is waiting for peer to connect# Accepts CLIENT_REQUEST events in this stateif self.event == PEER_BACKUP:
print("I: connected to backup (slave), ready as master")
self.state = STATE_ACTIVE
if self.master_callback:
self.loop.add_callback(self.master_callback)
elif self.event == PEER_ACTIVE:
print("I: connected to backup (master), ready as slave")
self.state = STATE_PASSIVE
if self.slave_callback:
self.loop.add_callback(self.slave_callback)
elif self.event == CLIENT_REQUEST:
if time.time() >= self.peer_expiry:
print("I: request from client, ready as master")
self.state = STATE_ACTIVE
if self.master_callback:
self.loop.add_callback(self.master_callback)
else:
# don't respond to clients yet - we don't know if# the backup is currently Active as a result of# a successful failover
accept = False
elif self.state == STATE_BACKUP:
# Backup server is waiting for peer to connect# Rejects CLIENT_REQUEST events in this stateif self.event == PEER_ACTIVE:
print("I: connected to primary (master), ready as slave")
self.state = STATE_PASSIVE
if self.slave_callback:
self.loop.add_callback(self.slave_callback)
elif self.event == CLIENT_REQUEST:
accept = False
elif self.state == STATE_ACTIVE:
# Server is active# Accepts CLIENT_REQUEST events in this state# The only way out of ACTIVE is deathif self.event == PEER_ACTIVE:
# Two masters would mean split-brainprint("E: fatal error - dual masters, aborting")
raise FSMError("Dual Masters")
elif self.state == STATE_PASSIVE:
# Server is passive# CLIENT_REQUEST events can trigger failover if peer looks deadif self.event == PEER_PRIMARY:
# Peer is restarting - become active, peer will go passiveprint("I: primary (slave) is restarting, ready as master")
self.state = STATE_ACTIVE
elif self.event == PEER_BACKUP:
# Peer is restarting - become active, peer will go passiveprint("I: backup (slave) is restarting, ready as master")
self.state = STATE_ACTIVE
elif self.event == PEER_PASSIVE:
# Two passives would mean cluster would be non-responsiveprint("E: fatal error - dual slaves, aborting")
raise FSMError("Dual slaves")
elif self.event == CLIENT_REQUEST:
# Peer becomes master if timeout has passed# It's the client request that triggers the failoverassert self.peer_expiry > 0if time.time() >= self.peer_expiry:
# If peer is dead, switch to the active stateprint("I: failover successful, ready as master")
self.state = STATE_ACTIVE
else:
# If peer is alive, reject connections
accept = False
# Call state change handler if necessaryif self.state == STATE_ACTIVE and self.master_callback:
self.loop.add_callback(self.master_callback)
return accept
# ---------------------------------------------------------------------# Reactor event handlers...defsend_state(self):
"""Publish our state to peer"""
self.statepub.send_string("%d" % self.state)
defrecv_state(self, msg):
"""Receive state from peer, execute finite state machine"""
state = msg[0]
if state:
self.event = int(state)
self.update_peer_expiry()
self.execute_fsm()
defvoter_ready(self, msg):
"""Application wants to speak to us, see if it's possible"""# If server can accept input now, call appl handler
self.event = CLIENT_REQUEST
if self.execute_fsm():
print("CLIENT REQUEST")
self.voter_callback(self.voter_socket, msg)
else:
# Message will be ignoredpass# -------------------------------------------------------------------------#defregister_voter(self, endpoint, type, handler):
"""Create socket, bind to local endpoint, and register as reader for
voting. The socket will only be available if the Binary Star state
machine allows it. Input on the socket will act as a "vote" in the
Binary Star scheme. We require exactly one voter per bstar instance.
handler will always be called with two arguments: (socket,msg)
where socket is the one we are creating here, and msg is the message
that triggered the POLLIN event.
"""assert self.voter_callback is None
socket = self.ctx.socket(type)
socket.bind(endpoint)
self.voter_socket = socket
self.voter_callback = handler
stream = ZMQStream(socket, self.loop)
stream.on_recv(self.voter_ready)
# =====================================================================
# bstar - Binary Star reactor
# =====================================================================
package require TclOO
package require mdp
package require zmq
package provide BStar 1.0# We send state information every this often
# If peer doesn't respond in two heartbeats, it is 'dead'
set BSTAR_HEARTBEAT 1000;# In msecs
# States we can be in at any point in time
# STATE(NONE) 0
# STATE(PRIMARY) 1 ;# Primary, waiting for peer to connect
# STATE(BACKUP) 2 ;# Backup, waiting for peer to connect
# STATE(ACTIVE) 3 ;# Active - accepting connections
# STATE(PASSIVE) 4 ;# Passive - not accepting connections
# Events, which start with the states our peer can be in
# EVENT(NONE) 0
# EVENT(PRIMARY) 1 ;# HA peer is pending primary
# EVENT(BACKUP) 2 ;# HA peer is pending backup
# EVENT(ACTIVE) 3 ;# HA peer is active
# EVENT(PASSIVE) 4 ;# HA peer is passive
# EVENT(REQUEST) 5 ;# Client makes request
oo::class create BStar {variable verbose ctx statepub statesub voter state event peer_expiry voterfn masterfn slavefn
constructor{istate local remote iverbose}{# Initialize the Binary Star
set verbose $iverboseset ctx [zmq context bstar_context_[::mdp::contextid]]set state $istateset event NONE
set peer_expiry 0set voterfn {}set masterfn {}set slavefn {}# Create publisher for state going to peer
set statepub [zmq socket bstar_socket_[::mdp::socketid]$ctx PUB]$statepubbind$local# Create subscriber for state coming from peer
set statesub [zmq socket bstar_socket_[::mdp::socketid]$ctx SUB]$statesubsetsockopt SUBSCRIBE ""$statesubconnect$remote}destructor{$statesubclose$statepubclose$ctxterm}method voter_callback {}{if{[llength$voterfn]}{{*}$voterfn$voter}}method master_callback {}{if{[llength$masterfn]}{{*}$masterfn}}method slave_callback {}{if{[llength$slavefn]}{{*}$slavefn}}method log {msg}{if{$verbose}{puts"[clock format [clock seconds]] $msg"}}method execute_fsm {}{set rc 0if{$stateeq"PRIMARY"}{# Primary server is waiting for peer to connect
# Accepts CLIENT_REQUEST events in this state
if{$eventeq"BACKUP"}{my log "I: connected to backup (slave), ready as master"set state ACTIVE
my master_callback
}elseif{$eventeq"ACTIVE"}{my log "I: connected to backup (master), ready as slave"set state PASSIVE
my slave_callback
}elseif{$eventeq"REQUEST"}{# Allow client requests to turn us into the master if we've
# waited sufficiently long to believe the backup is not
# currently acting as master (i.e., after a failover)
if{$peer_expiry <= 0}{error"expecte peer_expiry > 0"}if{[clock milliseconds] >= $peer_expiry}{my log "I: request from client, ready as master"set state ACTIVE
my master_callback
}else{# Don't respond to clients yet - it's possible we're
# performing a failback and the backup is currently master
set rc -1}}}elseif{$stateeq"BACKUP"}{# Backup server is waiting for peer to connect
# Rejects CLIENT_REQUEST events in this state
if{$eventeq"ACTIVE"}{my log "I: connected to primary (master), ready as slave"set state PASSIVE
my slave_callback
}elseif{$eventeq"REQUEST"}{set rc -1}}elseif{$stateeq"ACTIVE"}{# Server is active
# Accepts CLIENT_REQUEST events in this state
# The only way out of ACTIVE is death
if{$eventeq"ACTIVE"}{my log "E: fatal error - dual masters, aborting"set rc -1}}elseif{$stateeq"PASSIVE"}{# Server is passive
# CLIENT_REQUEST events can trigger failover if peer looks dead
if{$eventeq"PRIMARY"}{# Peer is restarting - become active, peer will go passive
my log "I: primary (slave) is restarting, ready as master"set state ACTIVE
}elseif{$eventeq"BACKUP"}{# Peer is restarting - become active, peer will go passive
my log "I: backup (slave) is restarting, ready as master"set state ACTIVE
}elseif{$eventeq"PASSIVE"}{# Two passives would mean cluster would be non-responsive
my log "E: fatal error - dual slaves, aborting"set rc -1}elseif{$eventeq"REQUEST"}{# Peer becomes master if timeout has passed
# It's the client request that triggers the failover
if{$peer_expiry < 0}{error"expecte peer_expiry >= 0"}if{[clock milliseconds] >= $peer_expiry}{# If peer is dead, switch to the active state
my log "I: failover successful, ready as master"set state ACTIVE
}else{# If peer is alive, reject connections
set rc -1}}if{$stateeq"ACTIVE"}{my master_callback
}}return$rc}method update_peer_expiry {}{set peer_expiry [expr{[clock milliseconds] + 2 * $::BSTAR_HEARTBEAT}]}# Reactor event handlers...
# Publish our state to peer
method send_state {}{my log "I: send state $state to peer"$statepubsend$stateafter$::BSTAR_HEARTBEAT[list[self] send_state]}# Receive state from peer, execute finite state machine
method recv_state {}{set nstate [$statesubrecv]my log "I: got state $nstate from peer"set event $nstatemy update_peer_expiry
my execute_fsm
}# Application wants to speak to us, see if it's possible
method voter_ready {}{# If server can accept input now, call appl handler
set event REQUEST
if{[my execute_fsm] == 0}{puts"CLIENT REQUEST"my voter_callback
}else{# Destroy waiting message, no-one to read it
zmsg recv $voter}}# Create socket, bind to local endpoint, and register as reader for
# voting. The socket will only be available if the Binary Star state
# machine allows it. Input on the socket will act as a "vote" in the
# Binary Star scheme. We require exactly one voter per bstar instance.
method voter {endpoint type handler}{# Hold actual handler+arg so we can call this later
set voter [zmq socket bstar_socket_[::mdp::socketid]$ctx$type]$voterbind$endpointset voterfn $handler}# Register state change handlers
method new_master {handler}{set masterfn $handler}method new_slave {handler}{set slavefn $handler}# Enable/disable verbose tracing
method set_verbose {iverbose}{set verbose $iverbose}# Start the reactor, ends if a callback function returns -1
method start {}{my update_peer_expiry
# Set-up reactor events
$statesubreadable[list[self] recv_state]$voterreadable[list[self] voter_ready]after$::BSTAR_HEARTBEAT[list[self] send_state]}}
It might seem ironic to focus so much on broker-based reliability, when we often explain ZeroMQ as “brokerless messaging”. However, in messaging, as in real life, the middleman is both a burden and a benefit. In practice, most messaging architectures benefit from a mix of distributed and brokered messaging. You get the best results when you can decide freely what trade-offs you want to make. This is why I can drive twenty minutes to a wholesaler to buy five cases of wine for a party, but I can also walk ten minutes to a corner store to buy one bottle for a dinner. Our highly context-sensitive relative valuations of time, energy, and cost are essential to the real world economy. And they are essential to an optimal message-based architecture.
This is why ZeroMQ does not impose a broker-centric architecture, though it does give you the tools to build brokers, aka proxies, and we’ve built a dozen or so different ones so far, just for practice.
So we’ll end this chapter by deconstructing the broker-based reliability we’ve built so far, and turning it back into a distributed peer-to-peer architecture I call the Freelance pattern. Our use case will be a name resolution service. This is a common problem with ZeroMQ architectures: how do we know the endpoint to connect to? Hard-coding TCP/IP addresses in code is insanely fragile. Using configuration files creates an administration nightmare. Imagine if you had to hand-configure your web browser, on every PC or mobile phone you used, to realize that “google.com” was “74.125.230.82”.
A ZeroMQ name service (and we’ll make a simple implementation) must do the following:
Resolve a logical name into at least a bind endpoint, and a connect endpoint. A realistic name service would provide multiple bind endpoints, and possibly multiple connect endpoints as well.
Allow us to manage multiple parallel environments, e.g., “test” versus “production”, without modifying code.
Be reliable, because if it is unavailable, applications won’t be able to connect to the network.
Putting a name service behind a service-oriented Majordomo broker is clever from some points of view. However, it’s simpler and much less surprising to just expose the name service as a server to which clients can connect directly. If we do this right, the name service becomes the only global network endpoint we need to hard-code in our code or configuration files.
Figure 55 - The Freelance Pattern
The types of failure we aim to handle are server crashes and restarts, server busy looping, server overload, and network issues. To get reliability, we’ll create a pool of name servers so if one crashes or goes away, clients can connect to another, and so on. In practice, two would be enough. But for the example, we’ll assume the pool can be any size.
In this architecture, a large set of clients connect to a small set of servers directly. The servers bind to their respective addresses. It’s fundamentally different from a broker-based approach like Majordomo, where workers connect to the broker. Clients have a couple of options:
Use REQ sockets and the Lazy Pirate pattern. Easy, but would need some additional intelligence so clients don’t stupidly try to reconnect to dead servers over and over.
Use DEALER sockets and blast out requests (which will be load balanced to all connected servers) until they get a reply. Effective, but not elegant.
Use ROUTER sockets so clients can address specific servers. But how does the client know the identity of the server sockets? Either the server has to ping the client first (complex), or the server has to use a hard-coded, fixed identity known to the client (nasty).
We’ll develop each of these in the following subsections.
So our menu appears to offer: simple, brutal, complex, or nasty. Let’s start with simple and then work out the kinks. We take Lazy Pirate and rewrite it to work with multiple server endpoints.
Start one or several servers first, specifying a bind endpoint as the argument:
## Freelance server - Model 1
# Trivial echo service
#
package require zmq
if{[llength$argv] != 1}{puts"Usage: flserver1.tcl <endpoint>"exit1}zmq context context
zmq socket server context REP
server bind [lindex$argv0]puts"I: echo service is ready at [lindex $argv 0]"while{1}{set msg [zmsg recv server]if{[llength$msg] == 0}{break}zmsg send server $msg}server close
context term
// Freelance client - Model 1
// Uses REQ socket to query one or more services
#include"czmq.h"#define REQUEST_TIMEOUT 1000
#define MAX_RETRIES 3 // Before we abandon
static zmsg_t *
s_try_request (zctx_t *ctx, char *endpoint, zmsg_t *request)
{
printf ("I: trying echo service at %s...\n", endpoint);
void *client = zsocket_new (ctx, ZMQ_REQ);
zsocket_connect (client, endpoint);
// Send request, wait safely for reply
zmsg_t *msg = zmsg_dup (request);
zmsg_send (&msg, client);
zmq_pollitem_t items [] = { { client, 0, ZMQ_POLLIN, 0 } };
zmq_poll (items, 1, REQUEST_TIMEOUT * ZMQ_POLL_MSEC);
zmsg_t *reply = NULL;
if (items [0].revents & ZMQ_POLLIN)
reply = zmsg_recv (client);
// Close socket in any case, we're done with it now
zsocket_destroy (ctx, client);
return reply;
}
// .split client task
// The client uses a Lazy Pirate strategy if it only has one server to talk
// to. If it has two or more servers to talk to, it will try each server just
// once:
intmain (int argc, char *argv [])
{
zctx_t *ctx = zctx_new ();
zmsg_t *request = zmsg_new ();
zmsg_addstr (request, "Hello world");
zmsg_t *reply = NULL;
int endpoints = argc - 1;
if (endpoints == 0)
printf ("I: syntax: %s <endpoint> ...\n", argv [0]);
elseif (endpoints == 1) {
// For one endpoint, we retry N times
int retries;
for (retries = 0; retries < MAX_RETRIES; retries++) {
char *endpoint = argv [1];
reply = s_try_request (ctx, endpoint, request);
if (reply)
break; // Successful
printf ("W: no response from %s, retrying...\n", endpoint);
}
}
else {
// For multiple endpoints, try each at most once
int endpoint_nbr;
for (endpoint_nbr = 0; endpoint_nbr < endpoints; endpoint_nbr++) {
char *endpoint = argv [endpoint_nbr + 1];
reply = s_try_request (ctx, endpoint, request);
if (reply)
break; // Successful
printf ("W: no response from %s\n", endpoint);
}
}
if (reply)
printf ("Service is running OK\n");
zmsg_destroy (&request);
zmsg_destroy (&reply);
zctx_destroy (&ctx);
return0;
}
flclient1: Freelance client, Model One in C++
// Freelance client - Model 1
// Uses REQ socket to query one or more services
#include<iostream>#include<zmq.hpp>#include<zmq_addon.hpp>constint REQUEST_TIMEOUT = 1000;
constint MAX_RETRIES = 3; // Before we abandon
static std::unique_ptr<zmq::message_t> s_try_request(zmq::context_t &context,
const std::string &endpoint,
const zmq::const_buffer &request) {
std::cout << "I: trying echo service at " << endpoint << std::endl;
zmq::socket_t client(context, zmq::socket_type::req);
// Set ZMQ_LINGER to REQUEST_TIMEOUT milliseconds, otherwise if we send a message to a server
// that is not working properly or even not exist, we may never be able to exit the program
client.setsockopt(ZMQ_LINGER, REQUEST_TIMEOUT);
client.connect(endpoint);
// Send request, wait safely for reply
zmq::message_t message(request.data(), request.size());
client.send(message, zmq::send_flags::none);
zmq_pollitem_t items[] = {{client, 0, ZMQ_POLLIN, 0}};
zmq::poll(items, 1, REQUEST_TIMEOUT);
std::unique_ptr<zmq::message_t> reply = std::make_unique<zmq::message_t>();
zmq::recv_result_t recv_result;
if (items[0].revents & ZMQ_POLLIN) recv_result = client.recv(*reply, zmq::recv_flags::none);
if (!recv_result) {
reply.release();
}
return reply;
}
// .split client task
// The client uses a Lazy Pirate strategy if it only has one server to talk
// to. If it has two or more servers to talk to, it will try each server just
// once:
intmain(int argc, char *argv[]) {
zmq::context_t context{1};
zmq::const_buffer request = zmq::str_buffer("Hello World!");
std::unique_ptr<zmq::message_t> reply;
int endpoints = argc - 1;
if (endpoints == 0)
std::cout << "I: syntax: " << argv[0] << "<endpoint> ..." << std::endl;
elseif (endpoints == 1) {
// For one endpoint, we retry N times
int retries;
for (retries = 0; retries < MAX_RETRIES; retries++) {
std::string endpoint = std::string(argv[1]);
reply = s_try_request(context, endpoint, request);
if (reply) break; // Successful
std::cout << "W: no response from " << endpoint << " retrying...\n" << std::endl;
}
} else {
// For multiple endpoints, try each at most once
int endpoint_nbr;
for (endpoint_nbr = 0; endpoint_nbr < endpoints; endpoint_nbr++) {
std::string endpoint = std::string(argv[endpoint_nbr + 1]);
reply = s_try_request(context, endpoint, request);
if (reply) break; // Successful
std::cout << "W: no response from " << endpoint << std::endl;
}
}
if (reply)
std::cout << "Service is running OK. Received message: " << reply->to_string() << std::endl;
return0;
}
packageguide;
importorg.zeromq.SocketType;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
importorg.zeromq.ZMsg;
// Freelance client - Model 1
// Uses REQ socket to query one or more services
publicclassflclient1
{
privatestaticfinalint REQUEST_TIMEOUT = 1000;
privatestaticfinalint MAX_RETRIES = 3; // Before we abandon
privatestatic ZMsg tryRequest(ZContext ctx, String endpoint, ZMsg request)
{
System.out.printf("I: trying echo service at %s...\n", endpoint);
Socket client = ctx.createSocket(SocketType.REQ);
client.connect(endpoint);
// Send request, wait safely for reply
ZMsg msg = request.duplicate();
msg.send(client);
Poller poller = ctx.createPoller(1);
poller.register(client, Poller.POLLIN);
poller.poll(REQUEST_TIMEOUT);
ZMsg reply = null;
if (poller.pollin(0))
reply = ZMsg.recvMsg(client);
// Close socket in any case, we're done with it now
ctx.destroySocket(client);
poller.close();
return reply;
}
// .split client task
// The client uses a Lazy Pirate strategy if it only has one server to talk
// to. If it has two or more servers to talk to, it will try each server just
// once:
publicstaticvoidmain(String[] argv)
{
try (ZContext ctx = new ZContext()) {
ZMsg request = new ZMsg();
request.add("Hello world");
ZMsg reply = null;
int endpoints = argv.length;
if (endpoints == 0)
System.out.printf("I: syntax: flclient1 <endpoint> ...\n");
elseif (endpoints == 1) {
// For one endpoint, we retry N times
int retries;
for (retries = 0; retries < MAX_RETRIES; retries++) {
String endpoint = argv[0];
reply = tryRequest(ctx, endpoint, request);
if (reply != null)
break; // Successful
System.out.printf(
"W: no response from %s, retrying...\n", endpoint
);
}
}
else {
// For multiple endpoints, try each at most once
int endpointNbr;
for (endpointNbr = 0; endpointNbr < endpoints; endpointNbr++) {
String endpoint = argv[endpointNbr];
reply = tryRequest(ctx, endpoint, request);
if (reply != null)
break; // Successful
System.out.printf("W: no response from %s\n", endpoint);
}
}
if (reply != null) {
System.out.printf("Service is running OK\n");
reply.destroy();
}
request.destroy();
}
}
}
## Freelance client - Model 1
# Uses REQ socket to query one or more services
#
package require zmq
set REQUEST_TIMEOUT 1000set MAX_RETRIES 3;# Before we abandon
if{[llength$argv] == 0}{puts"Usage: flclient1.tcl <endpoint> ..."exit1}proc s_try_request {ctx endpoint request}{puts"I: trying echo service at $endpoint..."zmq socket client $ctx REQ
client connect $endpoint# Send request, wait safely for reply
zmsg send client $requestset reply {}set rpoll_set [zmq poll {{client{POLLIN}}}$::REQUEST_TIMEOUT]if{[llength$rpoll_set] && "POLLIN"in[lindex$rpoll_set01]}{set reply [zmsg recv client]}# Close socket in any case, we're done with it now
client setsockopt LINGER 0client close
return$reply}zmq context context
set request {}set request [zmsg add $request"Hello World"]set reply {}if{[llength$argv] == 1}{# For one endpoint, we retry N times
set endpoint [lindex$argv0]for{set retries 0}{$retries < $MAX_RETRIES}{incr retries}{set reply [s_try_request context $endpoint$request]if{[llength$reply]}{break;# Successful
}puts"W: no response from $endpoint, retrying..."}}else{# For multiple endpoints, try each at most once
foreach endpoint $argv{set reply [s_try_request context $endpoint$request]if{[llength$reply]}{break;# Successful
}puts"W: no response from $endpoint"}}if{[llength$reply]}{puts"Service is running OK"}context term
Although the basic approach is Lazy Pirate, the client aims to just get one successful reply. It has two techniques, depending on whether you are running a single server or multiple servers:
With a single server, the client will retry several times, exactly as for Lazy Pirate.
With multiple servers, the client will try each server at most once until it’s received a reply or has tried all servers.
This solves the main weakness of Lazy Pirate, namely that it could not fail over to backup or alternate servers.
However, this design won’t work well in a real application. If we’re connecting many sockets and our primary name server is down, we’re going to experience this painful timeout each time.
Let’s switch our client to using a DEALER socket. Our goal here is to make sure we get a reply back within the shortest possible time, no matter whether a particular server is up or down. Our client takes this approach:
We set things up, connecting to all servers.
When we have a request, we blast it out as many times as we have servers.
We wait for the first reply, and take that.
We ignore any other replies.
What will happen in practice is that when all servers are running, ZeroMQ will distribute the requests so that each server gets one request and sends one reply. When any server is offline and disconnected, ZeroMQ will distribute the requests to the remaining servers. So a server may in some cases get the same request more than once.
What’s more annoying for the client is that we’ll get multiple replies back, but there’s no guarantee we’ll get a precise number of replies. Requests and replies can get lost (e.g., if the server crashes while processing a request).
So we have to number requests and ignore any replies that don’t match the request number. Our Model One server will work because it’s an echo server, but coincidence is not a great basis for understanding. So we’ll make a Model Two server that chews up the message and returns a correctly numbered reply with the content “OK”. We’ll use messages consisting of two parts: a sequence number and a body.
Start one or more servers, specifying a bind endpoint each time:
---- Freelance server - Model 2-- Does some work, replies OK, with message sequencing---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"zmq"
require"zmsg"if (#arg < 1) then
printf ("I: syntax: %s <endpoint>\n", arg[0])
os.exit (0)
endlocal context = zmq.init(1)
s_catch_signals()
local server = context:socket(zmq.REP)
server:bind(arg[1])
printf ("I: service is ready at %s\n", arg[1])
while (not s_interrupted) dolocal msg, err = zmsg.recv(server)
if err then
print('recv error:', err)
break-- Interruptedend-- Fail nastily if run against wrong client
assert (msg:parts() == 2)
msg:body_set("OK")
msg:send(server)
endif (s_interrupted) then
printf("W: interrupted\n")
end
server:close()
context:term()
## Freelance server - Model 2
# Does some work, replies OK, with message sequencing
#
package require zmq
if{[llength$argv] != 1}{puts"Usage: flserver2.tcl <endpoint>"exit1}zmq context context
zmq socket server context REP
server bind [lindex$argv0]puts"I: echo service is ready at [lindex $argv 0]"while{1}{set request [zmsg recv server]if{[llength$request] == 0}{break}# Fail nastily if run against wrong client
if{[llength$request] != 2}{error"request with length 2 expected"}set address [zmsg pop request]set reply {}set reply [zmsg add $reply$address]set reply [zmsg add $reply"OK"]zmsg send server $reply}server close
context term
// Freelance client - Model 2
// Uses DEALER socket to blast one or more services
#include"czmq.h"// We design our client API as a class, using the CZMQ style
#ifdef __cplusplus
extern"C" {
#endif
typedefstruct _flclient_t flclient_t;
flclient_t *flclient_new (void);
voidflclient_destroy (flclient_t **self_p);
voidflclient_connect (flclient_t *self, char *endpoint);
zmsg_t *flclient_request (flclient_t *self, zmsg_t **request_p);
#ifdef __cplusplus
}
#endif
// If not a single service replies within this time, give up
#define GLOBAL_TIMEOUT 2500
int main (int argc, char *argv [])
{
if (argc == 1) {
printf ("I: syntax: %s <endpoint> ...\n", argv [0]);
return0;
}
// Create new freelance client object
flclient_t *client = flclient_new ();
// Connect to each endpoint
int argn;
for (argn = 1; argn < argc; argn++)
flclient_connect (client, argv [argn]);
// Send a bunch of name resolution 'requests', measure time
int requests = 10000;
uint64_t start = zclock_time ();
while (requests--) {
zmsg_t *request = zmsg_new ();
zmsg_addstr (request, "random name");
zmsg_t *reply = flclient_request (client, &request);
if (!reply) {
printf ("E: name service not available, aborting\n");
break;
}
zmsg_destroy (&reply);
}
printf ("Average round trip cost: %d usec\n",
(int) (zclock_time () - start) / 10);
flclient_destroy (&client);
return0;
}
// .split class implementation
// Here is the {{flclient}} class implementation. Each instance has a
// context, a DEALER socket it uses to talk to the servers, a counter
// of how many servers it's connected to, and a request sequence number:
struct _flclient_t {
zctx_t *ctx; // Our context wrapper
void *socket; // DEALER socket talking to servers
size_t servers; // How many servers we have connected to
uint sequence; // Number of requests ever sent
};
// Constructor
flclient_t *
flclient_new (void)
{
flclient_t
*self;
self = (flclient_t *) zmalloc (sizeof (flclient_t));
self->ctx = zctx_new ();
self->socket = zsocket_new (self->ctx, ZMQ_DEALER);
return self;
}
// Destructor
voidflclient_destroy (flclient_t **self_p)
{
assert (self_p);
if (*self_p) {
flclient_t *self = *self_p;
zctx_destroy (&self->ctx);
free (self);
*self_p = NULL;
}
}
// Connect to new server endpoint
voidflclient_connect (flclient_t *self, char *endpoint)
{
assert (self);
zsocket_connect (self->socket, endpoint);
self->servers++;
}
// .split request method
// This method does the hard work. It sends a request to all
// connected servers in parallel (for this to work, all connections
// must be successful and completed by this time). It then waits
// for a single successful reply, and returns that to the caller.
// Any other replies are just dropped:
zmsg_t *
flclient_request (flclient_t *self, zmsg_t **request_p)
{
assert (self);
assert (*request_p);
zmsg_t *request = *request_p;
// Prefix request with sequence number and empty envelope
char sequence_text [10];
sprintf (sequence_text, "%u", ++self->sequence);
zmsg_pushstr (request, sequence_text);
zmsg_pushstr (request, "");
// Blast the request to all connected servers
int server;
for (server = 0; server < self->servers; server++) {
zmsg_t *msg = zmsg_dup (request);
zmsg_send (&msg, self->socket);
}
// Wait for a matching reply to arrive from anywhere
// Since we can poll several times, calculate each one
zmsg_t *reply = NULL;
uint64_t endtime = zclock_time () + GLOBAL_TIMEOUT;
while (zclock_time () < endtime) {
zmq_pollitem_t items [] = { { self->socket, 0, ZMQ_POLLIN, 0 } };
zmq_poll (items, 1, (endtime - zclock_time ()) * ZMQ_POLL_MSEC);
if (items [0].revents & ZMQ_POLLIN) {
// Reply is [empty][sequence][OK]
reply = zmsg_recv (self->socket);
assert (zmsg_size (reply) == 3);
free (zmsg_popstr (reply));
char *sequence = zmsg_popstr (reply);
int sequence_nbr = atoi (sequence);
free (sequence);
if (sequence_nbr == self->sequence)
break;
zmsg_destroy (&reply);
}
}
zmsg_destroy (request_p);
return reply;
}
flclient2: Freelance client, Model Two in C++
// Freelance client - Model 2
// Uses DEALER socket to blast one or more services
#include<chrono>#include<iostream>#include<memory>#include<zmqpp/zmqpp.hpp>// If not a single service replies within this time, give up
constint GLOBAL_TIMEOUT = 2500;
// Total requests times
constint TOTAL_REQUESTS = 10000;
// .split class implementation
// Here is the {{flclient}} class implementation. Each instance has a
// context, a DEALER socket it uses to talk to the servers, a counter
// of how many servers it's connected to, and a request sequence number:
classflclient {
public:
flclient();
~flclient() {}
voidconnect(const std::string &endpoint);
std::unique_ptr<zmqpp::message> request(zmqpp::message &request);
private:
zmqpp::context context_; // Our context
zmqpp::socket socket_; // DEALER socket talking to servers
size_t servers_; // How many servers we have connected to
uint sequence_; // Number of requests ever sent
};
// Constructor
flclient::flclient() : socket_(context_, zmqpp::socket_type::dealer) {
socket_.set(zmqpp::socket_option::linger, GLOBAL_TIMEOUT);
servers_ = 0;
sequence_ = 0;
}
// Connect to new server endpoint
void flclient::connect(const std::string &endpoint) {
socket_.connect(endpoint);
servers_++;
}
// .split request method
// This method does the hard work. It sends a request to all
// connected servers in parallel (for this to work, all connections
// must be successful and completed by this time). It then waits
// for a single successful reply, and returns that to the caller.
// Any other replies are just dropped:
std::unique_ptr<zmqpp::message> flclient::request(zmqpp::message &request) {
// Prefix request with sequence number and empty envelope
request.push_front(++sequence_);
request.push_front("");
// Blast the request to all connected servers
size_t server;
for (server = 0; server < servers_; server++) {
zmqpp::message msg;
msg.copy(request);
socket_.send(msg);
}
// Wait for a matching reply to arrive from anywhere
// Since we can poll several times, calculate each one
std::unique_ptr<zmqpp::message> reply;
zmqpp::poller poller;
poller.add(socket_, zmqpp::poller::poll_in);
auto endTime = std::chrono::system_clock::now() + std::chrono::milliseconds(GLOBAL_TIMEOUT);
while (std::chrono::system_clock::now() < endTime) {
int milliSecondsToWait = std::chrono::duration_cast<std::chrono::milliseconds>(
endTime - std::chrono::system_clock::now())
.count();
if (poller.poll(milliSecondsToWait)) {
if (poller.has_input(socket_)) {
reply = std::make_unique<zmqpp::message>();
// Reply is [empty][sequence][OK]
socket_.receive(*reply);
assert(reply->parts() == 3);
reply->pop_front();
uint sequence;
reply->get(sequence, 0);
reply->pop_front();
// std::cout << "Current sequence: " << sequence_ << ", Server reply: " << sequence
// << std::endl;
if (sequence == sequence_)
break;
else
reply.release();
}
}
}
return reply;
}
intmain(int argc, char *argv[]) {
if (argc == 1) {
std::cout << "I: syntax: " << argv[0] << " <endpoint> ..." << std::endl;
return0;
}
// Create new freelance client object
flclient client;
// Connect to each endpoint
int argn;
for (argn = 1; argn < argc; argn++) client.connect(argv[argn]);
// Send a bunch of name resolution 'requests', measure time
int requests = TOTAL_REQUESTS;
auto startTime = std::chrono::steady_clock::now();
while (requests--) {
zmqpp::message request;
request.push_back("random name");
std::unique_ptr<zmqpp::message> reply;
reply = client.request(request);
if (!reply) {
std::cout << "E: name service not available, aborting" << std::endl;
break;
}
}
auto endTime = std::chrono::steady_clock::now();
std::cout
<< "Average round trip cost: "
<< std::chrono::duration_cast<std::chrono::microseconds>(endTime - startTime).count() /
TOTAL_REQUESTS
<< " µs" << std::endl;
return0;
}
packageguide;
importorg.zeromq.SocketType;
importorg.zeromq.ZContext;
importorg.zeromq.ZMQ;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
importorg.zeromq.ZMsg;
// Freelance client - Model 2
// Uses DEALER socket to blast one or more services
publicclassflclient2
{
// If not a single service replies within this time, give up
privatestaticfinalint GLOBAL_TIMEOUT = 2500;
// .split class implementation
// Here is the {{flclient}} class implementation. Each instance has a
// context, a DEALER socket it uses to talk to the servers, a counter
// of how many servers it's connected to, and a request getSequence number:
private ZContext ctx; // Our context wrapper
private Socket socket; // DEALER socket talking to servers
privateint servers; // How many servers we have connected to
privateint sequence; // Number of requests ever sent
publicflclient2()
{
ctx = new ZContext();
socket = ctx.createSocket(SocketType.DEALER);
}
publicvoiddestroy()
{
ctx.destroy();
}
privatevoidconnect(String endpoint)
{
socket.connect(endpoint);
servers++;
}
private ZMsg request(ZMsg request)
{
// Prefix request with getSequence number and empty envelope
String sequenceText = String.format("%d", ++sequence);
request.push(sequenceText);
request.push("");
// Blast the request to all connected servers
int server;
for (server = 0; server < servers; server++) {
ZMsg msg = request.duplicate();
msg.send(socket);
}
// Wait for a matching reply to arrive from anywhere
// Since we can poll several times, calculate each one
ZMsg reply = null;
long endtime = System.currentTimeMillis() + GLOBAL_TIMEOUT;
Poller poller = ctx.createPoller(1);
poller.register(socket, Poller.POLLIN);
while (System.currentTimeMillis() < endtime) {
poller.poll(endtime - System.currentTimeMillis());
if (poller.pollin(0)) {
// Reply is [empty][getSequence][OK]
reply = ZMsg.recvMsg(socket);
assert (reply.size() == 3);
reply.pop();
String sequenceStr = reply.popString();
int sequenceNbr = Integer.parseInt(sequenceStr);
if (sequenceNbr == sequence)
break;
reply.destroy();
}
}
poller.close();
request.destroy();
return reply;
}
publicstaticvoidmain(String[] argv)
{
if (argv.length == 0) {
System.out.printf("I: syntax: flclient2 <endpoint> ...\n");
System.exit(0);
}
// Create new freelance client object
flclient2 client = new flclient2();
// Connect to each endpoint
int argn;
for (argn = 0; argn < argv.length; argn++)
client.connect(argv[argn]);
// Send a bunch of name resolution 'requests', measure time
int requests = 10000;
long start = System.currentTimeMillis();
while (requests-- > 0) {
ZMsg request = new ZMsg();
request.add("random name");
ZMsg reply = client.request(request);
if (reply == null) {
System.out.printf("E: name service not available, aborting\n");
break;
}
reply.destroy();
}
System.out.printf("Average round trip cost: %d usec\n", (int) (System.currentTimeMillis() - start) / 10);
client.destroy();
}
}
## Freelance client - Model 2
# Uses DEALER socket to blast one or more services
#
lappend auto_path .
package require TclOO
package require zmq
package require mdp
if{[llength$argv] == 0}{puts"Usage: flclient2.tcl <endpoint> ..."exit1}# If not a single service replies within this time, give up
set GLOBAL_TIMEOUT 2500oo::class create FLClient {variable ctx socket servers sequence
constructor{}{set ctx [zmq context mdcli_context_[::mdp::contextid]]set socket [zmq socket mdcli_socket_[::mdp::socketid]$ctx DEALER]set servers 0set sequence 0}destructor{$socketsetsockopt LINGER 0$socketclose$ctxterm}method connect {endpoint}{$socketconnect$endpointincr servers
}# Send request, get reply
method request {request}{# Prefix request with sequence number and empty envelope
set request [zmsg push $request[incr sequence]]set request [zmsg push $request""]# Blast the request to all connected servers
for{set server 0}{$server < $servers}{incr server}{zmsg send $socket$request}# Wait for a matching reply to arrive from anywhere
# Since we can poll several times, calculate each one
set reply {}set endtime [expr{[clock milliseconds] + $::GLOBAL_TIMEOUT}]while{[clock milliseconds] < $endtime}{set rpoll_set [zmq poll [list[list$socket{POLLIN}]][expr{($endtime-[clock milliseconds])}]]if{[llength$rpoll_set] && "POLLIN"in[lindex$rpoll_set01]}{# Reply is [empty][sequence][OK]
set reply [zmsg recv $socket]if{[llength$reply] != 3}{error"expected reply with length 3"}zmsg pop reply
set rsequence [zmsg pop reply]if{$rsequence == $sequence}{break}}}return$reply}}# Create new freelance client object
set client [FLClient new]# Connect to each endpoint
foreach endpoint $argv{$clientconnect$endpoint}# Send a bunch of name resolution 'requests', measure time
set requests 100set start [clock microseconds]for{set i 0}{$i < $requests}{incr i}{set request {}set request [zmsg add $request"random name"]set reply [$clientrequest$request]if{[llength$reply] == 0}{puts"E: name service not available, aborting"break}}puts"Average round trip cost: [expr {([clock microseconds] - $start) / $requests}] usec"$clientdestroy
Here are some things to note about the client implementation:
The client is structured as a nice little class-based API that hides the dirty work of creating ZeroMQ contexts and sockets and talking to the server. That is, if a shotgun blast to the midriff can be called “talking”.
The client will abandon the chase if it can’t find any responsive server within a few seconds.
The client has to create a valid REP envelope, i.e., add an empty message frame to the front of the message.
The client performs 10,000 name resolution requests (fake ones, as our server does essentially nothing) and measures the average cost. On my test box, talking to one server, this requires about 60 microseconds. Talking to three servers, it takes about 80 microseconds.
The pros and cons of our shotgun approach are:
Pro: it is simple, easy to make and easy to understand.
Pro: it does the job of failover, and works rapidly, so long as there is at least one server running.
Con: it creates redundant network traffic.
Con: we can’t prioritize our servers, i.e., Primary, then Secondary.
Con: the server can do at most one request at a time, period.
The shotgun approach seems too good to be true. Let’s be scientific and work through all the alternatives. We’re going to explore the complex/nasty option, even if it’s only to finally realize that we preferred brutal. Ah, the story of my life.
We can solve the main problems of the client by switching to a ROUTER socket. That lets us send requests to specific servers, avoid servers we know are dead, and in general be as smart as we want to be. We can also solve the main problem of the server (single-threadedness) by switching to a ROUTER socket.
But doing ROUTER to ROUTER between two anonymous sockets (which haven’t set an identity) is not possible. Both sides generate an identity (for the other peer) only when they receive a first message, and thus neither can talk to the other until it has first received a message. The only way out of this conundrum is to cheat, and use hard-coded identities in one direction. The proper way to cheat, in a client/server case, is to let the client “know” the identity of the server. Doing it the other way around would be insane, on top of complex and nasty, because any number of clients should be able to arise independently. Insane, complex, and nasty are great attributes for a genocidal dictator, but terrible ones for software.
Rather than invent yet another concept to manage, we’ll use the connection endpoint as identity. This is a unique string on which both sides can agree without more prior knowledge than they already have for the shotgun model. It’s a sneaky and effective way to connect two ROUTER sockets.
Remember how ZeroMQ identities work. The server ROUTER socket sets an identity before it binds its socket. When a client connects, they do a little handshake to exchange identities, before either side sends a real message. The client ROUTER socket, having not set an identity, sends a null identity to the server. The server generates a random UUID to designate the client for its own use. The server sends its identity (which we’ve agreed is going to be an endpoint string) to the client.
This means that our client can route a message to the server (i.e., send on its ROUTER socket, specifying the server endpoint as identity) as soon as the connection is established. That’s not immediately after doing a zmq_connect(), but some random time thereafter. Herein lies one problem: we don’t know when the server will actually be available and complete its connection handshake. If the server is online, it could be after a few milliseconds. If the server is down and the sysadmin is out to lunch, it could be an hour from now.
There’s a small paradox here. We need to know when servers become connected and available for work. In the Freelance pattern, unlike the broker-based patterns we saw earlier in this chapter, servers are silent until spoken to. Thus we can’t talk to a server until it’s told us it’s online, which it can’t do until we’ve asked it.
My solution is to mix in a little of the shotgun approach from model 2, meaning we’ll fire (harmless) shots at anything we can, and if anything moves, we know it’s alive. We’re not going to fire real requests, but rather a kind of ping-pong heartbeat.
---- Freelance server - Model 3-- Uses an ROUTER/ROUTER socket but just one thread---- Author: Robert G. Jakabosky <bobby@sharedrealm.com>--
require"zmq"
require"zmsg"local verbose = (arg[1] == "-v")
local context = zmq.init(1)
s_catch_signals ()
-- Prepare server socket with predictable identitylocal bind_endpoint = "tcp://*:5555"local connect_endpoint = "tcp://localhost:5555"local server = context:socket(zmq.ROUTER)
server:setopt(zmq.IDENTITY, connect_endpoint)
server:bind(bind_endpoint)
printf ("I: service is ready at %s\n", bind_endpoint)
while (not s_interrupted) dolocal request = zmsg.recv (server)
local reply = nilif (not request) thenbreak-- Interruptedendif (verbose) then
request:dump()
end-- Frame 0: identity of client-- Frame 1: PING, or client control frame-- Frame 2: request bodylocal address = request:pop()
if (request:parts() == 1and request:body() == "PING") then
reply = zmsg.new ("PONG")
elseif (request:parts() > 1) then
reply = request
request = nil
reply:body_set("OK")
end
reply:push(address)
if (verbose and reply) then
reply:dump()
end
reply:send(server)
endif (s_interrupted) then
printf ("W: interrupted\n")
end
server:close()
context:term()
flserver3: Freelance server, Model Three in Node.js
flserver3: Freelance server, Model Three in Python
"""Freelance server - Model 3
Uses an ROUTER/ROUTER socket but just one thread
Author: Min RK <benjaminrk@gmail.com>
"""importsysimportzmqfromzhelpersimport dump
defmain():
verbose = '-v'in sys.argv
ctx = zmq.Context()
# Prepare server socket with predictable identity
bind_endpoint = "tcp://*:5555"
connect_endpoint = "tcp://localhost:5555"
server = ctx.socket(zmq.ROUTER)
server.identity = connect_endpoint
server.bind(bind_endpoint)
print"I: service is ready at", bind_endpoint
while True:
try:
request = server.recv_multipart()
except:
break# Interrupted# Frame 0: identity of client# Frame 1: PING, or client control frame# Frame 2: request body
address, control = request[:2]
reply = [address, control]
if control == "PING":
reply[1] = "PONG"else:
reply.append("OK")
if verbose:
dump(reply)
server.send_multipart(reply)
print"W: interrupted"if __name__ == '__main__':
main()
The Freelance client, however, has gotten large. For clarity, it’s split into an example application and a class that does the hard work. Here’s the top-level application:
// flcliapi class - Freelance Pattern agent class
// Implements the Freelance Protocol at http://rfc.zeromq.org/spec:10
#include"flcliapi.h"// If no server replies within this time, abandon request
#define GLOBAL_TIMEOUT 3000 // msecs
// PING interval for servers we think are alive
#define PING_INTERVAL 2000 // msecs
// Server considered dead if silent for this long
#define SERVER_TTL 6000 // msecs
// .split API structure
// This API works in two halves, a common pattern for APIs that need to
// run in the background. One half is an frontend object our application
// creates and works with; the other half is a backend "agent" that runs
// in a background thread. The frontend talks to the backend over an
// inproc pipe socket:
// Structure of our frontend class
struct _flcliapi_t {
zctx_t *ctx; // Our context wrapper
void *pipe; // Pipe through to flcliapi agent
};
// This is the thread that handles our real flcliapi class
staticvoidflcliapi_agent (void *args, zctx_t *ctx, void *pipe);
// Constructor
flcliapi_t *
flcliapi_new (void)
{
flcliapi_t
*self;
self = (flcliapi_t *) zmalloc (sizeof (flcliapi_t));
self->ctx = zctx_new ();
self->pipe = zthread_fork (self->ctx, flcliapi_agent, NULL);
return self;
}
// Destructor
voidflcliapi_destroy (flcliapi_t **self_p)
{
assert (self_p);
if (*self_p) {
flcliapi_t *self = *self_p;
zctx_destroy (&self->ctx);
free (self);
*self_p = NULL;
}
}
// .split connect method
// To implement the connect method, the frontend object sends a multipart
// message to the backend agent. The first part is a string "CONNECT", and
// the second part is the endpoint. It waits 100msec for the connection to
// come up, which isn't pretty, but saves us from sending all requests to a
// single server, at startup time:
voidflcliapi_connect (flcliapi_t *self, char *endpoint)
{
assert (self);
assert (endpoint);
zmsg_t *msg = zmsg_new ();
zmsg_addstr (msg, "CONNECT");
zmsg_addstr (msg, endpoint);
zmsg_send (&msg, self->pipe);
zclock_sleep (100); // Allow connection to come up
}
// .split request method
// To implement the request method, the frontend object sends a message
// to the backend, specifying a command "REQUEST" and the request message:
zmsg_t *
flcliapi_request (flcliapi_t *self, zmsg_t **request_p)
{
assert (self);
assert (*request_p);
zmsg_pushstr (*request_p, "REQUEST");
zmsg_send (request_p, self->pipe);
zmsg_t *reply = zmsg_recv (self->pipe);
if (reply) {
char *status = zmsg_popstr (reply);
if (streq (status, "FAILED"))
zmsg_destroy (&reply);
free (status);
}
return reply;
}
// .split backend agent
// Here we see the backend agent. It runs as an attached thread, talking
// to its parent over a pipe socket. It is a fairly complex piece of work
// so we'll break it down into pieces. First, the agent manages a set of
// servers, using our familiar class approach:
// Simple class for one server we talk to
typedefstruct {
char *endpoint; // Server identity/endpoint
uint alive; // 1 if known to be alive
int64_t ping_at; // Next ping at this time
int64_t expires; // Expires at this time
} server_t;
server_t *
server_new (char *endpoint)
{
server_t *self = (server_t *) zmalloc (sizeof (server_t));
self->endpoint = strdup (endpoint);
self->alive = 0;
self->ping_at = zclock_time () + PING_INTERVAL;
self->expires = zclock_time () + SERVER_TTL;
return self;
}
voidserver_destroy (server_t **self_p)
{
assert (self_p);
if (*self_p) {
server_t *self = *self_p;
free (self->endpoint);
free (self);
*self_p = NULL;
}
}
intserver_ping (constchar *key, void *server, void *socket)
{
server_t *self = (server_t *) server;
if (zclock_time () >= self->ping_at) {
zmsg_t *ping = zmsg_new ();
zmsg_addstr (ping, self->endpoint);
zmsg_addstr (ping, "PING");
zmsg_send (&ping, socket);
self->ping_at = zclock_time () + PING_INTERVAL;
}
return0;
}
intserver_tickless (constchar *key, void *server, void *arg)
{
server_t *self = (server_t *) server;
uint64_t *tickless = (uint64_t *) arg;
if (*tickless > self->ping_at)
*tickless = self->ping_at;
return0;
}
// .split backend agent class
// We build the agent as a class that's capable of processing messages
// coming in from its various sockets:
// Simple class for one background agent
typedefstruct {
zctx_t *ctx; // Own context
void *pipe; // Socket to talk back to application
void *router; // Socket to talk to servers
zhash_t *servers; // Servers we've connected to
zlist_t *actives; // Servers we know are alive
uint sequence; // Number of requests ever sent
zmsg_t *request; // Current request if any
zmsg_t *reply; // Current reply if any
int64_t expires; // Timeout for request/reply
} agent_t;
agent_t *
agent_new (zctx_t *ctx, void *pipe)
{
agent_t *self = (agent_t *) zmalloc (sizeof (agent_t));
self->ctx = ctx;
self->pipe = pipe;
self->router = zsocket_new (self->ctx, ZMQ_ROUTER);
self->servers = zhash_new ();
self->actives = zlist_new ();
return self;
}
voidagent_destroy (agent_t **self_p)
{
assert (self_p);
if (*self_p) {
agent_t *self = *self_p;
zhash_destroy (&self->servers);
zlist_destroy (&self->actives);
zmsg_destroy (&self->request);
zmsg_destroy (&self->reply);
free (self);
*self_p = NULL;
}
}
// .split control messages
// This method processes one message from our frontend class
// (it's going to be CONNECT or REQUEST):
// Callback when we remove server from agent 'servers' hash table
staticvoids_server_free (void *argument)
{
server_t *server = (server_t *) argument;
server_destroy (&server);
}
voidagent_control_message (agent_t *self)
{
zmsg_t *msg = zmsg_recv (self->pipe);
char *command = zmsg_popstr (msg);
if (streq (command, "CONNECT")) {
char *endpoint = zmsg_popstr (msg);
printf ("I: connecting to %s...\n", endpoint);
int rc = zmq_connect (self->router, endpoint);
assert (rc == 0);
server_t *server = server_new (endpoint);
zhash_insert (self->servers, endpoint, server);
zhash_freefn (self->servers, endpoint, s_server_free);
zlist_append (self->actives, server);
server->ping_at = zclock_time () + PING_INTERVAL;
server->expires = zclock_time () + SERVER_TTL;
free (endpoint);
}
elseif (streq (command, "REQUEST")) {
assert (!self->request); // Strict request-reply cycle
// Prefix request with sequence number and empty envelope
char sequence_text [10];
sprintf (sequence_text, "%u", ++self->sequence);
zmsg_pushstr (msg, sequence_text);
// Take ownership of request message
self->request = msg;
msg = NULL;
// Request expires after global timeout
self->expires = zclock_time () + GLOBAL_TIMEOUT;
}
free (command);
zmsg_destroy (&msg);
}
// .split router messages
// This method processes one message from a connected
// server:
voidagent_router_message (agent_t *self)
{
zmsg_t *reply = zmsg_recv (self->router);
// Frame 0 is server that replied
char *endpoint = zmsg_popstr (reply);
server_t *server =
(server_t *) zhash_lookup (self->servers, endpoint);
assert (server);
free (endpoint);
if (!server->alive) {
zlist_append (self->actives, server);
server->alive = 1;
}
server->ping_at = zclock_time () + PING_INTERVAL;
server->expires = zclock_time () + SERVER_TTL;
// Frame 1 may be sequence number for reply
char *sequence = zmsg_popstr (reply);
if (atoi (sequence) == self->sequence) {
zmsg_pushstr (reply, "OK");
zmsg_send (&reply, self->pipe);
zmsg_destroy (&self->request);
}
else
zmsg_destroy (&reply);
}
// .split backend agent implementation
// Finally, here's the agent task itself, which polls its two sockets
// and processes incoming messages:
staticvoidflcliapi_agent (void *args, zctx_t *ctx, void *pipe)
{
agent_t *self = agent_new (ctx, pipe);
zmq_pollitem_t items [] = {
{ self->pipe, 0, ZMQ_POLLIN, 0 },
{ self->router, 0, ZMQ_POLLIN, 0 }
};
while (!zctx_interrupted) {
// Calculate tickless timer, up to 1 hour
uint64_t tickless = zclock_time () + 1000 * 3600;
if (self->request
&& tickless > self->expires)
tickless = self->expires;
zhash_foreach (self->servers, server_tickless, &tickless);
int rc = zmq_poll (items, 2,
(tickless - zclock_time ()) * ZMQ_POLL_MSEC);
if (rc == -1)
break; // Context has been shut down
if (items [0].revents & ZMQ_POLLIN)
agent_control_message (self);
if (items [1].revents & ZMQ_POLLIN)
agent_router_message (self);
// If we're processing a request, dispatch to next server
if (self->request) {
if (zclock_time () >= self->expires) {
// Request expired, kill it
zstr_send (self->pipe, "FAILED");
zmsg_destroy (&self->request);
}
else {
// Find server to talk to, remove any expired ones
while (zlist_size (self->actives)) {
server_t *server =
(server_t *) zlist_first (self->actives);
if (zclock_time () >= server->expires) {
zlist_pop (self->actives);
server->alive = 0;
}
else {
zmsg_t *request = zmsg_dup (self->request);
zmsg_pushstr (request, server->endpoint);
zmsg_send (&request, self->router);
break;
}
}
}
}
// Disconnect and delete any expired servers
// Send heartbeats to idle servers if needed
zhash_foreach (self->servers, server_ping, self->router);
}
agent_destroy (&self);
}
flcliapi: Freelance client API in C++
// flcliapi class - Freelance Pattern agent class
// Implements the Freelance Protocol at http://rfc.zeromq.org/spec:10
#include"flcliapi.hpp"// If no server replies within this time, abandon request
constint GLOBAL_TIMEOUT = 3000; // msecs
// PING interval for servers we think are alive
constint PING_INTERVAL = 500; // msecs
// Server considered dead if silent for this long
constint SERVER_TTL = 1000; // msecs
// This API works in two halves, a common pattern for APIs that need to
// run in the background. One half is an frontend object our application
// creates and works with; the other half is a backend "agent" that runs
// in a background thread. The frontend talks to the backend over an
// inproc pipe socket created by actor object:
// Constructor
Flcliapi::Flcliapi()
: actor_(std::bind(&Flcliapi::agent, this, std::placeholders::_1, std::ref(context_))) {}
Flcliapi::~Flcliapi() {}
// connect interface
// To implement the connect method, the frontend object sends a multipart
// message to the backend agent. The first part is a string "CONNECT", and
// the second part is the endpoint. It waits 100msec for the connection to
// come up, which isn't pretty, but saves us from sending all requests to a
// single server, at startup time:
void Flcliapi::connect(const std::string& endpoint) {
zmqpp::message msg;
msg.push_back("CONNECT");
msg.push_back(endpoint);
actor_.pipe()->send(msg);
std::this_thread::sleep_for(std::chrono::milliseconds(100)); // Allow connection to come up
}
// request interface
// To implement the request method, the frontend object sends a message
// to the backend, specifying a command "REQUEST" and the request message:
std::unique_ptr<zmqpp::message> Flcliapi::request(zmqpp::message& request) {
assert(request.parts() > 0);
request.push_front("REQUEST");
actor_.pipe()->send(request);
std::unique_ptr<zmqpp::message> reply = std::make_unique<zmqpp::message>();
actor_.pipe()->receive(*reply);
if (0 != reply->parts()) {
if (reply->get(0) == "FAILED") reply.release();
} else {
reply.release();
}
return reply;
}
Server::Server(const std::string& endpoint) {
endpoint_ = endpoint;
alive_ = false;
ping_at_ = std::chrono::steady_clock::now() + std::chrono::milliseconds(PING_INTERVAL);
expires_ = std::chrono::steady_clock::now() + std::chrono::milliseconds(SERVER_TTL);
}
Server::~Server() {}
int Server::ping(zmqpp::socket& socket) {
if (std::chrono::steady_clock::now() >= ping_at_) {
zmqpp::message ping;
ping.push_back(endpoint_);
ping.push_back("PING");
socket.send(ping);
ping_at_ = std::chrono::steady_clock::now() + std::chrono::milliseconds(PING_INTERVAL);
}
return0;
}
int Server::tickless(std::chrono::time_point<std::chrono::steady_clock>& tickless_at) {
if (tickless_at > ping_at_) tickless_at = ping_at_;
return0;
}
Agent::Agent(zmqpp::context& context, zmqpp::socket* pipe)
: context_(context), pipe_(pipe), router_(context, zmqpp::socket_type::router) {
router_.set(zmqpp::socket_option::linger, GLOBAL_TIMEOUT);
sequence_ = 0;
}
Agent::~Agent() {}
// control messages
// This method processes one message from our frontend class
// (it's going to be CONNECT or REQUEST):
void Agent::control_message(std::unique_ptr<zmqpp::message> msg) {
std::string command = msg->get(0);
msg->pop_front();
if (command == "CONNECT") {
std::string endpoint = msg->get(0);
msg->pop_front();
std::cout << "I: connecting to " << endpoint << "..." << std::endl;
try {
router_.connect(endpoint);
} catch (zmqpp::zmq_internal_exception& e) {
std::cerr << "failed to bind to endpoint " << endpoint << ": " << e.what() << std::endl;
return;
}
std::shared_ptr<Server> server = std::make_shared<Server>(endpoint);
servers_.insert(std::pair<std::string, std::shared_ptr<Server>>(endpoint, server));
// actives_.push_back(server);
server->setPingAt(std::chrono::steady_clock::now() +
std::chrono::milliseconds(PING_INTERVAL));
server->setExpires(std::chrono::steady_clock::now() +
std::chrono::milliseconds(SERVER_TTL));
} elseif (command == "REQUEST") {
assert(!request_); // Strict request-reply cycle
// Prefix request with sequence number and empty envelope
msg->push_front(++sequence_);
// Take ownership of request message
request_ = std::move(msg);
// Request expires after global timeout
expires_ = std::chrono::steady_clock::now() + std::chrono::milliseconds(GLOBAL_TIMEOUT);
}
}
// .split router messages
// This method processes one message from a connected
// server:
void Agent::router_message() {
zmqpp::message reply;
router_.receive(reply);
// Frame 0 is server that replied
std::string endpoint = reply.get(0);
reply.pop_front();
assert(servers_.count(endpoint));
std::shared_ptr<Server> server = servers_.at(endpoint);
if (!server->isAlive()) {
actives_.push_back(server);
server->setAlive(true);
}
server->setPingAt(std::chrono::steady_clock::now() + std::chrono::milliseconds(PING_INTERVAL));
server->setExpires(std::chrono::steady_clock::now() + std::chrono::milliseconds(SERVER_TTL));
// Frame 1 may be sequence number for reply
uint sequence;
reply.get(sequence, 0);
reply.pop_front();
if (request_) {
if (sequence == sequence_) {
request_.release();
reply.push_front("OK");
pipe_->send(reply);
}
}
}
// .split backend agent implementation
// Finally, here's the agent task itself, which polls its two sockets
// and processes incoming messages:
bool Flcliapi::agent(zmqpp::socket* pipe, zmqpp::context& context) {
Agent self(context, pipe);
zmqpp::poller poller;
poller.add(*self.getPipe());
poller.add(self.getRouter());
pipe->send(zmqpp::signal::ok); // signal we successfully started
while (true) {
// Calculate tickless timer, up to 1 hour
std::chrono::time_point<std::chrono::steady_clock> tickless =
std::chrono::steady_clock::now() + std::chrono::hours(1);
if (self.request_ && tickless > self.expires_) tickless = self.expires_;
for (auto& kv : self.servers_) {
kv.second->tickless(tickless);
}
if (poller.poll(std::chrono::duration_cast<std::chrono::milliseconds>(
tickless - std::chrono::steady_clock::now())
.count())) {
if (poller.has_input(*self.getPipe())) {
std::unique_ptr<zmqpp::message> msg = std::make_unique<zmqpp::message>();
pipe->receive(*msg);
if (msg->is_signal()) {
zmqpp::signal sig;
msg->get(sig, 0);
if (sig == zmqpp::signal::stop) break; // actor receive stop signal, exit
} else
self.control_message(std::move(msg));
}
if (poller.has_input(self.getRouter())) self.router_message();
}
// If we're processing a request, dispatch to next server
if (self.request_) {
if (std::chrono::steady_clock::now() >= self.expires_) {
// Request expired, kill it
self.request_.release();
self.getPipe()->send("FAILED");
} else {
// Find server to talk to, remove any expired ones
while (self.actives_.size() > 0) {
auto& server = self.actives_.front();
if (std::chrono::steady_clock::now() >= server->getExpires()) {
server->setAlive(false);
self.actives_.pop_front();
} else {
zmqpp::message request;
request.copy(*self.request_);
request.push_front(server->getEndpoint());
self.getRouter().send(request);
break;
}
}
}
}
for (auto& kv : self.servers_) {
kv.second->ping(self.getRouter());
}
}
returntrue; // will send signal::ok to signal successful shutdown
}
packageguide;
importjava.util.ArrayList;
importjava.util.HashMap;
importjava.util.List;
importjava.util.Map;
importorg.zeromq.*;
importorg.zeromq.ZMQ.Poller;
importorg.zeromq.ZMQ.Socket;
importorg.zeromq.ZThread.IAttachedRunnable;
// flcliapi class - Freelance Pattern agent class
// Implements the Freelance Protocol at http://rfc.zeromq.org/spec:10
publicclassflcliapi
{
// If not a single service replies within this time, give up
privatestaticfinalint GLOBAL_TIMEOUT = 2500;
// PING interval for servers we think are alive
privatestaticfinalint PING_INTERVAL = 2000; // msecs
// Server considered dead if silent for this long
privatestaticfinalint SERVER_TTL = 6000; // msecs
// .split API structure
// This API works in two halves, a common pattern for APIs that need to
// run in the background. One half is an frontend object our application
// creates and works with; the other half is a backend "agent" that runs
// in a background thread. The frontend talks to the backend over an
// inproc pipe socket:
// Structure of our frontend class
private ZContext ctx; // Our context wrapper
private Socket pipe; // Pipe through to flcliapi agent
publicflcliapi()
{
ctx = new ZContext();
FreelanceAgent agent = new FreelanceAgent();
pipe = ZThread.fork(ctx, agent);
}
publicvoiddestroy()
{
ctx.destroy();
}
// .split connect method
// To implement the connect method, the frontend object sends a multipart
// message to the backend agent. The first part is a string "CONNECT", and
// the second part is the endpoint. It waits 100msec for the connection to
// come up, which isn't pretty, but saves us from sending all requests to a
// single server, at startup time:
publicvoidconnect(String endpoint)
{
ZMsg msg = new ZMsg();
msg.add("CONNECT");
msg.add(endpoint);
msg.send(pipe);
try {
Thread.sleep(100); // Allow connection to come up
}
catch (InterruptedException e) {
}
}
// .split request method
// To implement the request method, the frontend object sends a message
// to the backend, specifying a command "REQUEST" and the request message:
public ZMsg request(ZMsg request)
{
request.push("REQUEST");
request.send(pipe);
ZMsg reply = ZMsg.recvMsg(pipe);
if (reply != null) {
String status = reply.popString();
if (status.equals("FAILED"))
reply.destroy();
}
return reply;
}
// .split backend agent
// Here we see the backend agent. It runs as an attached thread, talking
// to its parent over a pipe socket. It is a fairly complex piece of work
// so we'll break it down into pieces. First, the agent manages a set of
// servers, using our familiar class approach:
// Simple class for one server we talk to
privatestaticclassServer
{
private String endpoint; // Server identity/endpoint
privateboolean alive; // 1 if known to be alive
privatelong pingAt; // Next ping at this time
privatelong expires; // Expires at this time
protectedServer(String endpoint)
{
this.endpoint = endpoint;
alive = false;
pingAt = System.currentTimeMillis() + PING_INTERVAL;
expires = System.currentTimeMillis() + SERVER_TTL;
}
protectedvoiddestroy()
{
}
privatevoidping(Socket socket)
{
if (System.currentTimeMillis() >= pingAt) {
ZMsg ping = new ZMsg();
ping.add(endpoint);
ping.add("PING");
ping.send(socket);
pingAt = System.currentTimeMillis() + PING_INTERVAL;
}
}
privatelongtickless(long tickless)
{
if (tickless > pingAt)
return pingAt;
return -1;
}
}
// .split backend agent class
// We build the agent as a class that's capable of processing messages
// coming in from its various sockets:
// Simple class for one background agent
privatestaticclassAgent
{
private ZContext ctx; // Own context
private Socket pipe; // Socket to talk back to application
private Socket router; // Socket to talk to servers
private Map<String, Server> servers; // Servers we've connected to
private List<Server> actives; // Servers we know are alive
privateint sequence; // Number of requests ever sent
private ZMsg request; // Current request if any
private ZMsg reply; // Current reply if any
privatelong expires; // Timeout for request/reply
protectedAgent(ZContext ctx, Socket pipe)
{
this.ctx = ctx;
this.pipe = pipe;
router = ctx.createSocket(SocketType.ROUTER);
servers = new HashMap<String, Server>();
actives = new ArrayList<Server>();
}
protectedvoiddestroy()
{
for (Server server : servers.values())
server.destroy();
}
// .split control messages
// This method processes one message from our frontend class
// (it's going to be CONNECT or REQUEST):
// Callback when we remove server from agent 'servers' hash table
privatevoidcontrolMessage()
{
ZMsg msg = ZMsg.recvMsg(pipe);
String command = msg.popString();
if (command.equals("CONNECT")) {
String endpoint = msg.popString();
System.out.printf("I: connecting to %s...\n", endpoint);
router.connect(endpoint);
Server server = new Server(endpoint);
servers.put(endpoint, server);
actives.add(server);
server.pingAt = System.currentTimeMillis() + PING_INTERVAL;
server.expires = System.currentTimeMillis() + SERVER_TTL;
}
elseif (command.equals("REQUEST")) {
assert (request == null); // Strict request-reply cycle
// Prefix request with getSequence number and empty envelope
String sequenceText = String.format("%d", ++sequence);
msg.push(sequenceText);
// Take ownership of request message
request = msg;
msg = null;
// Request expires after global timeout
expires = System.currentTimeMillis() + GLOBAL_TIMEOUT;
}
if (msg != null)
msg.destroy();
}
// .split router messages
// This method processes one message from a connected
// server:
privatevoidrouterMessage()
{
ZMsg reply = ZMsg.recvMsg(router);
// Frame 0 is server that replied
String endpoint = reply.popString();
Server server = servers.get(endpoint);
assert (server != null);
if (!server.alive) {
actives.add(server);
server.alive = true;
}
server.pingAt = System.currentTimeMillis() + PING_INTERVAL;
server.expires = System.currentTimeMillis() + SERVER_TTL;
// Frame 1 may be getSequence number for reply
String sequenceStr = reply.popString();
if (Integer.parseInt(sequenceStr) == sequence) {
reply.push("OK");
reply.send(pipe);
request.destroy();
request = null;
}
else reply.destroy();
}
}
// .split backend agent implementation
// Finally, here's the agent task itself, which polls its two sockets
// and processes incoming messages:
staticprivateclassFreelanceAgentimplements IAttachedRunnable
{
@Overridepublicvoidrun(Object[] args, ZContext ctx, Socket pipe)
{
Agent agent = new Agent(ctx, pipe);
Poller poller = ctx.createPoller(2);
poller.register(agent.pipe, Poller.POLLIN);
poller.register(agent.router, Poller.POLLIN);
while (!Thread.currentThread().isInterrupted()) {
// Calculate tickless timer, up to 1 hour
long tickless = System.currentTimeMillis() + 1000 * 3600;
if (agent.request != null && tickless > agent.expires)
tickless = agent.expires;
for (Server server : agent.servers.values()) {
long newTickless = server.tickless(tickless);
if (newTickless > 0)
tickless = newTickless;
}
int rc = poller.poll(tickless - System.currentTimeMillis());
if (rc == -1)
break; // Context has been shut down
if (poller.pollin(0))
agent.controlMessage();
if (poller.pollin(1))
agent.routerMessage();
// If we're processing a request, dispatch to next server
if (agent.request != null) {
if (System.currentTimeMillis() >= agent.expires) {
// Request expired, kill it
agent.pipe.send("FAILED");
agent.request.destroy();
agent.request = null;
}
else {
// Find server to talk to, remove any expired ones
while (!agent.actives.isEmpty()) {
Server server = agent.actives.get(0);
if (System.currentTimeMillis() >= server.expires) {
agent.actives.remove(0);
server.alive = false;
}
else {
ZMsg request = agent.request.duplicate();
request.push(server.endpoint);
request.send(agent.router);
break;
}
}
}
}
// Disconnect and delete any expired servers
// Send heartbeats to idle servers if needed
for (Server server : agent.servers.values())
server.ping(agent.router);
}
agent.destroy();
}
}
}
"""
flcliapi - Freelance Pattern agent class
Model 3: uses ROUTER socket to address specific services
Author: Min RK <benjaminrk@gmail.com>
"""importthreadingimporttimeimportzmqfromzhelpersimport zpipe
# If no server replies within this time, abandon request
GLOBAL_TIMEOUT = 3000# msecs# PING interval for servers we think are alivecp
PING_INTERVAL = 2000# msecs# Server considered dead if silent for this long
SERVER_TTL = 6000# msecsdefflciapi_agent(peer):
"""This is the thread that handles our real flcliapi class
"""pass# =====================================================================# Synchronous part, works in our application threadclassFreelanceClient(object):
ctx = None # Our Context
pipe = None # Pipe through to flciapi agent
agent = None # agent in a threaddef __init__(self):
self.ctx = zmq.Context()
self.pipe, peer = zpipe(self.ctx)
self.agent = threading.Thread(target=agent_task, args=(self.ctx,peer))
self.agent.daemon = True
self.agent.start()
defconnect(self, endpoint):
"""Connect to new server endpoint
Sends [CONNECT][endpoint] to the agent
"""
self.pipe.send_multipart(["CONNECT", endpoint])
time.sleep(0.1) # Allow connection to come updefrequest(self, msg):
"Send request, get reply"
request = ["REQUEST"] + msg
self.pipe.send_multipart(request)
reply = self.pipe.recv_multipart()
status = reply.pop(0)
if status != "FAILED":
return reply
# =====================================================================# Asynchronous part, works in the background# ---------------------------------------------------------------------# Simple class for one server we talk toclassFreelanceServer(object):
endpoint = None # Server identity/endpoint
alive = True # 1 if known to be alive
ping_at = 0# Next ping at this time
expires = 0# Expires at this timedef __init__(self, endpoint):
self.endpoint = endpoint
self.alive = True
self.ping_at = time.time() + 1e-3*PING_INTERVAL
self.expires = time.time() + 1e-3*SERVER_TTL
defping(self, socket):
if time.time() > self.ping_at:
socket.send_multipart([self.endpoint, 'PING'])
self.ping_at = time.time() + 1e-3*PING_INTERVAL
deftickless(self, tickless):
if tickless > self.ping_at:
tickless = self.ping_at
return tickless
# ---------------------------------------------------------------------# Simple class for one background agentclassFreelanceAgent(object):
ctx = None # Own context
pipe = None # Socket to talk back to application
router = None # Socket to talk to servers
servers = None # Servers we've connected to
actives = None # Servers we know are alive
sequence = 0# Number of requests ever sent
request = None # Current request if any
reply = None # Current reply if any
expires = 0# Timeout for request/replydef __init__(self, ctx, pipe):
self.ctx = ctx
self.pipe = pipe
self.router = ctx.socket(zmq.ROUTER)
self.servers = {}
self.actives = []
defcontrol_message (self):
msg = self.pipe.recv_multipart()
command = msg.pop(0)
if command == "CONNECT":
endpoint = msg.pop(0)
print"I: connecting to %s...\n" % endpoint,
self.router.connect(endpoint)
server = FreelanceServer(endpoint)
self.servers[endpoint] = server
self.actives.append(server)
# these are in the C case, but seem redundant:
server.ping_at = time.time() + 1e-3*PING_INTERVAL
server.expires = time.time() + 1e-3*SERVER_TTL
elif command == "REQUEST":
assertnot self.request # Strict request-reply cycle# Prefix request with sequence number and empty envelope
self.request = [str(self.sequence), ''] + msg
# Request expires after global timeout
self.expires = time.time() + 1e-3*GLOBAL_TIMEOUT
defrouter_message (self):
reply = self.router.recv_multipart()
# Frame 0 is server that replied
endpoint = reply.pop(0)
server = self.servers[endpoint]
ifnot server.alive:
self.actives.append(server)
server.alive = 1
server.ping_at = time.time() + 1e-3*PING_INTERVAL
server.expires = time.time() + 1e-3*SERVER_TTL;
# Frame 1 may be sequence number for reply
sequence = reply.pop(0)
ifint(sequence) == self.sequence:
self.sequence += 1
reply = ["OK"] + reply
self.pipe.send_multipart(reply)
self.request = None
# ---------------------------------------------------------------------# Asynchronous agent manages server pool and handles request/reply# dialog when the application asks for it.defagent_task(ctx, pipe):
agent = FreelanceAgent(ctx, pipe)
poller = zmq.Poller()
poller.register(agent.pipe, zmq.POLLIN)
poller.register(agent.router, zmq.POLLIN)
while True:
# Calculate tickless timer, up to 1 hour
tickless = time.time() + 3600if (agent.request and tickless > agent.expires):
tickless = agent.expires
for server in agent.servers.values():
tickless = server.tickless(tickless)
try:
items = dict(poller.poll(1000 * (tickless - time.time())))
except:
break# Context has been shut downif agent.pipe in items:
agent.control_message()
if agent.router in items:
agent.router_message()
# If we're processing a request, dispatch to next serverif (agent.request):
if (time.time() >= agent.expires):
# Request expired, kill it
agent.pipe.send("FAILED")
agent.request = None
else:
# Find server to talk to, remove any expired oneswhile agent.actives:
server = agent.actives[0]
if time.time() >= server.expires:
server.alive = 0
agent.actives.pop(0)
else:
request = [server.endpoint] + agent.request
agent.router.send_multipart(request)
break# Disconnect and delete any expired servers# Send heartbeats to idle servers if neededfor server in agent.servers.values():
server.ping(agent.router)
This API implementation is fairly sophisticated and uses a couple of techniques that we’ve not seen before.
Multithreaded API: the client API consists of two parts, a synchronous flcliapi class that runs in the application thread, and an asynchronous agent class that runs as a background thread. Remember how ZeroMQ makes it easy to create multithreaded apps. The flcliapi and agent classes talk to each other with messages over an inproc socket. All ZeroMQ aspects (such as creating and destroying a context) are hidden in the API. The agent in effect acts like a mini-broker, talking to servers in the background, so that when we make a request, it can make a best effort to reach a server it believes is available.
Tickless poll timer: in previous poll loops we always used a fixed tick interval, e.g., 1 second, which is simple enough but not excellent on power-sensitive clients (such as notebooks or mobile phones), where waking the CPU costs power. For fun, and to help save the planet, the agent uses a tickless timer, which calculates the poll delay based on the next timeout we’re expecting. A proper implementation would keep an ordered list of timeouts. We just check all timeouts and calculate the poll delay until the next one.
In this chapter, we’ve seen a variety of reliable request-reply mechanisms, each with certain costs and benefits. The example code is largely ready for real use, though it is not optimized. Of all the different patterns, the two that stand out for production use are the Majordomo pattern, for broker-based reliability, and the Freelance pattern, for brokerless reliability.