Abstract
Constraint satisfaction problems are ubiquitous in many domains. They are typically solved using conventional digital computing architectures that do not reflect the distributed nature of many of these problems, and are thus illsuited for solving them. Here we present a parallel analogue/digital hardware architecture specifically designed to solve such problems. We cast constraint satisfaction problems as networks of stereotyped nodes that communicate using digital pulses, or events. Each node contains an oscillator implemented using analogue circuits. The nonrepeating phase relations among the oscillators drive the exploration of the solution space. We show that this hardware architecture can yield stateoftheart performance on random SAT problems under reasonable assumptions on the implementation. We present measurements from a prototype electronic chip to demonstrate that a physical implementation of the proposed architecture is robust to practical nonidealities and to validate the theory proposed.
Introduction
Constraint satisfaction problems (CSPs) are a fundamental class of problems in computer science with wide applicability in areas such as channel coding^{1}, circuit optimization^{2} and scheduling^{3}. Algorithms for solving CSPs are typically run on classical von Neumann computing platforms that were not explicitly designed for these types of problems. This paper addresses the question: how can we implement a more efficient computing substrate whose architecture and dynamics better reflect the distributed nature of CSPs?
Many dynamical systems that have been proposed for solving CSPs violate the ‘physical implementability’ condition^{4,5,6}. Nonphysicality arises from the use of variables that can grow without bounds as the system is searching for solutions. On the other hand, there is a long, wellestablished tradition of studying physically realizable dynamical systems, for example, in the form of artificial neural networks, to solve CSPs or ‘bestmatch problems’^{7,8}. Early attempts in this field used attractor networks, such as Hopfield networks^{9}, to solve NP hard (nondeterministic polynomialtime hard) problems such as the travelling salesman problem^{10,11}. These attractor networks, however, would often get stuck at locally optimal solutions. To overcome this problem, stochastic mechanisms were proposed^{12,13}, which require explicit sources of noise to force the network to continuously explore the solution space. While noise is an inextricable part of any physical system, dynamically controlling its power to balance ‘exploratory’ versus ‘greedy’ search, or to move the network from an exploratory phase to a greedy one according to an annealing schedule, is not a trivial operation and puts an additional overhead on the physical implementation.
Here we present a mixed analogue/digital hardware architecture whose dynamics execute an efficient search for CSP solutions without the need for external sources of noise. For this reason, the architecture can be easily and efficiently implemented using complementary metaloxide semiconductor verylargescale integration (VLSI) electronic circuits. In the proposed architecture, each variable in a CSP is represented by a node consisting of an analogue oscillator and a stateholding asynchronous digital circuit. To achieve robust and scalable computation, the nodes communicate using digital pulses, or events. This combination of analogue and digital circuits running in a hybrid continuous/eventdriven mode avoids many of the problems that affect pure analogue VLSI systems such as susceptibility to noise, degradation of analogue signals during storage and communication, and signal restoration/refresh issues. Since the oscillators are realized using analogue circuit, when fabricated, they exhibit incommensurable natural frequencies, that is, frequencies that are not rational multiples of each other. Our architecture relies on the nonrepeating phase relations among these incommensurable analogue oscillators to drive the search for optimal solutions, rather than making use of external sources of noise or relying on random fluctuations. We show that the architecture naturally reproduces the dynamics of stochastic local search (SLS) algorithms. These algorithms are typically incomplete, that is, they cannot prove that a solution does not exist for unsatisfiable problems. Our main contributions are a physical architecture with novel dynamics for solving CSPs, together with massively parallel reformulations of wellestablished CSP algorithms so that they can be efficiently instantiated on the proposed architecture. We present results from an implementation of this architecture on a prototype VLSI chip. Our results expose a surprising relation between the dynamics of coupled multistable oscillators and the search for CSP solutions and highlight a novel mode of distributed, parallel, mixed analogue/digital computation that can form the basis of various hardware/physical systems for solving CSPs.
Results
Description of the architecture
The proposed eventbased architecture for solving CSPs employs a network of nodes that communicate via digital events. A node is shown schematically in Fig. 1a. Each node has N externally accessible input ports, one internal input port, M output ports and one dummy output port. The analogue oscillator in the node generates a continuous stream of digital events that are sent to the node’s internal port: ‘in.0’. The digital logic in the node has an internal state s, which can take one of Q possible values. On the arrival of an event on any of the input ports, the node’s digital logic evaluates the index of the output port to which it should send the event based on the index of the triggered input port and on the current state of the digital logic; it updates its internal state; and it transmits the event via the output port selected (Fig. 1b). Selection of the ‘dummy’ output port ‘out.0’ is equivalent to suppressing the event. The digital logic is fully described by the event routing function g and the state update function f, which are both deterministic. Given their analogue nature, the natural frequencies of the oscillators in the different nodes are not rational multiples of each other. Due to fabricationinduced mismatch, different oscillators realized on a VLSI chip will have incommensurable natural frequencies. It is important that coupling between these physical oscillators be kept at a minimum so as to minimize the chance of phase locking.
For solving CSPs, a subset of the nodes in the network will represent the actual problem variables, while others will represent helper variables that encode other problemrelevant quantities (for example, whether a constraint is satisfied or not). The value of a variable/node at a point in time is the index of the output port on which the node emitted its last event. Thus, a variable/node with M output ports can have M possible values. The output port of one node can connect to the input ports of one or more nodes and one input port can receive events from multiple output ports. One output port cannot be connected to multiple input ports on the same node. In the following sections, we describe how to connect nodes/variables together and how to define the nodes/variables behaviour (the f and g functions) so as to solve a number of hard CSPs. The procedure to map a CSP to this distributed architecture depends on the type of the CSP but in general, the mapping is done so that the distributed and parallel dynamics of the network of nodes tries to put the problem variables/nodes in a state where their outputs satisfy all the constraints.
Figures 1c and 1d show the definition and illustrate the behaviour of an example node that has two input ports, two output ports and two possible internal states (N=M=Q=2). The state of the example node/variable is the index of the last external event it received and the node/variable advertises its state by generating an event on one of the output ports when it receives an event on the internal port ‘in.0’ as shown in Fig. 1c (we refer to this as ‘updating’). Assume this example node is the target node receiving events from multiple sources nodes. Since it is only the last received event that determines the value advertised by an updating node, the phase relations between the analogue oscillators in the network determine which of the source nodes generates the decisive event that determines the event generated by the target node. This would be the source node that updated just before the target node updates. The phase relations are continuously changing in an aperiodic manner since the oscillation frequencies are incommensurable. The shifting phase relations thus continuously change which source node manages to influence the output events of the target node.
For the node described in Fig. 1c, assume N_{1} nodes with frequencies are sending events to its ‘in.1’ port and N_{2} nodes with frequencies are sending events to its ‘in.2’ port, the fraction of 1 events generated by the target node converges to if observed for a long enough time. Assuming the differences in oscillator frequencies of the source nodes are small, the latter expression can be written as . Thus, the more nodes that try to force a target node to a particular value, the more likely the target node is to output that value, yet there is always a chance that even a single source node that is in conflict with the majority will update just before the target node updates, thereby causing the target node to go against the majority influence. As we will show in the next sections, this behaviour can be exploited to allow the network to escape from local minima where flipping a single variable/node may increase the number of violated constraints. However, a node will never take a value that is in conflict with all incoming influences which is why the globally optimal state is stable. We show that this mostly greedy, but sometimes exploratory, behaviour can be exploited to efficiently solve a variety of hard CSPs.
In contrast to simulated annealing, there is no temperature parameter that balances the greedy versus the exploratory aspect of the network behaviour. The exploration of the solution space is not noise driven but, rather, it is driven by the continuously changing, nonrepeating order of event generation. This has the advantage that no cooling schedule is needed and there is no artificial separation between an exploratory (high temperature) phase and a greedy (low temperature) phase. This enables the problem to be dynamically changed and new constraints added without having to restart any cooling schedule. Due to the incommensurable oscillation frequencies, the network trajectory is aperiodic, so the system does not get stuck in loops. One disadvantage, however, of using deterministic incommensurable oscillators to drive the search is that we cannot derive probabilistic convergence results as in simulated annealing.
Boolean satisfiability problems
Let X={x_{1},…,x_{N}} be a set of Boolean variables. A literal is either a variable or its negation. The solution to a Boolean satisfiability or KSAT problem is the variable assignment that satisfies the logical expression involving the variables of X:
where the clause c_{i} is the disjunction of K literals. KSAT for K≥3 is NP complete^{14}.
A highly efficient algorithm for solving random SAT problems is the probSAT algorithm^{15}, winner of the parallel random track of the 2014 satcompetition^{16}. probSAT iteratively modifies a variable assignment by choosing a random unfulfilled clause c_{u} and changing the assignment of (‘flipping’) a random variable x_{f} in c_{u}, thereby fulfilling c_{u}. The choice x_{f} is governed by a heuristic function f(m,b), where m (the ‘make’ heuristic) is the number of clauses that are newly fulfilled when x_{f} is flipped and b (the ‘break’ heuristic) is the number of clauses that are newly unfulfilled when x_{f} is flipped. The heuristic function is renormalized into a probability over the available choices and x_{f} is chosen according to these probabilities. The heuristic function f(m,b) can take several different forms. In our benchmarks, we use the particularly effective ‘exponential’ form:
where x and y are parameters. If no solution is found after n_{max} flips, all variables are set to new random values, that is, the algorithm is restarted.
We consider KSAT for K∈{3,4,5}. We map the probSAT algorithm with only the ‘break’ heuristic to our architecture using two types of nodes: nodes representing variables and nodes representing constraints/clauses. Each variable node has two states. It updates and advertises its state (by generating an event on one of its two output ports) whenever it receives an event from a clause node. In addition, it advertises its state whenever it receives an event from the internal oscillator.
Figure 2 shows a sample 3SAT network. When the clause node receives a break event (event arriving on one of its break input ports, one corresponding to each variable), it increments the corresponding break counter. On events from the internal oscillator, a clause node evaluates what state the connected variables have last advertised. If there is no variable in a fulfilling state, the clause node sends an event to flip the variable with the smallest associated ‘break’ count and sends a ‘break’ event to every clause node in which this variable appears with the opposite polarity to indicate that the flipped variable is the only variable keeping the constraint fulfilled. If there is exactly one fulfilling variable, the clause node only sends a ‘break’ event for that variable; every rth ‘break’ event is skipped and not sent where r is a fixed parameter. If there is more than one fulfilling variable, the clause node does not send out any events. The break counters are reset after each event from the internal oscillator.
An unfulfilled clause node thus always chooses to flip the variable with minimal break count (with ties resolved according to a fixed variable ordering). The skipping of some break events implements a ‘softening’ of this hard minimum function. This flip heuristic is deterministic and simpler than the heuristic employed by standard probSAT.
We also implement a functional replacement of the random reinitialization: if after n_{max} variable updates, no solution has been found, all literals receive a ‘flip’ message. This is not a random reinitialization, but it is a method of forcing the algorithm into a distant part of the solution space.
We compare the performance of the network to that of the standard (sequential) probSAT algorithm^{15}. Our aim here is to show that the modifications we introduced to tailor probSAT to our architecture have not degraded its performance. We evaluate the network performance in several different cases: The ‘ideal’ case where events are transmitted instantly and never lost; and a ‘nonideal’ case where events have a delay uniformly distributed between 0 and 10% of the node oscillation period and a 10% chance to get lost completely. The nonideal case simulates the imperfections of an actual physical implementation where event delivery is neither instantaneous nor guaranteed. We also consider the case in which nodes update at a pseudorandom order, which is equivalent to having the oscillator in each node generate a Poisson instead of a periodic event train. This pseudorandom update ordering more closely matches the pseudorandom selection of unsatisfied clauses in the standard probSAT algorithm.
As benchmarks, we use uniformrandomly generated 3, 4 and 5SAT instances of various sizes and (in the case of 4 and 5SAT) with various clause densities α (the ratio of clauses to variables). Different values of α have been shown to correspond to different geometrical organizations of the solution space^{17}; we have chosen the values of α so that they lie in the different hard regimes.
We cannot give a comparison with stateoftheart benchmarks because our software simulator is only able to simulate 10 cycles for each node per second if used to simulate networks implementing large modern benchmarks. On the basis of the solution times of standard probSAT, 10^{6}–10^{9} cycles are needed to solve a single modern benchmark problem (that is, 1–1,000 days to simulate the corresponding network). As an alternative, we evaluate the performance on various problem sizes to ensure that the network performance scales equally well as standard probSAT.
Figure 3 shows the median number of variable flips required to reach the solution; this measure shows the effectiveness of a SLS algorithm in an implementation independent way. The various implementations of probSAT show very similar scaling behaviour in all cases; it is therefore reasonable to assume that for large problems the network performs as well as the standard algorithm in terms of number of flips to solution. To match the 1–10 MegaFlips per second that a standard current CPU achieves when running probSAT, each node in our network version would need an analogue oscillator of frequency around 2 MHz (in the network, approximately five flips happen each cycle due to parallelism).
A potential bottleneck in our proposed hardware architecture is the event routing fabric. We can calculate the approximate routing rate necessary to match the standard implementation of probSAT by dividing the number of events to solution (Fig. 3) by the median runtime of standard probSAT, which was 10 ms for the hardest 4SAT problems. This yields a required combined event routing rate of around 100 billion events per second. Note that a single event generated by a node is typically dispatched to multiple nodes. An event generated by a node is thus typically split into multiple events, each targeting a single node. Each of these ‘split’ events is separately counted when calculating the required event rate. A 100 billion events per second rate can be achieved if the highly local nature of node communication (which reflects the local nature of the constraint graph) is exploited by a parallel event routing scheme, and if the event routing scheme supports multicasting that allows a single source event to be efficiently routed to multiple destinations.
Using this approach, we can make similar statements with respect to other SAT solvers: at what oscillation rate and event routing rate need our architecture be implemented to match the solution times of other solvers on sample problems (in this case, however, the comparison does not say anything about solution times on bigger problems, because there is no structural similarity between the solvers). For two other wellknown solvers, minisat^{18} and survey propagation^{19}, we study the solution times of the hardest 4SAT problems (for smaller problems the overhead starting the solvers would skew results unfairly in our favour); in the case of survey propagation, we only took into account the times to correctly converged solutions. Unsurprisingly, probSAT is somewhat faster; to match minisat on these problems, we require an average oscillator frequency of 0.2 kHz and an event routing rate of 0.1 billion events per second, and to match survey propagation, an oscillator frequency of 0.4 MHz and a 20 billion events per second event routing rate is needed.
Surprisingly, the nonideal network performs better than the ideal one. Losing/delaying events might increase network efficiency by making it more exploratory as clauses now have imperfect information about the state of the variable nodes. The comparison between the pseudorandom update ordering and the ordering induced by the incommensurable oscillators shows that a pseudorandom ordering is slightly more efficient.
The architecture we present in this paper is well suited for implementing SLS algorithms such as probSAT as it can exploit the nonrepeating phase relations among the incommensurable oscillators as a source of nonrepeating noise that drives the search. Local search algorithms are typically incomplete, that is, they cannot prove that a solution does not exist for unsatisfiable formulae. In Supplementary Fig.1 and Supplementary Note1, we describe how a complete SAT algorithm can be mapped to the proposed architecture. We use an algorithm that is custom tailored to the architecture and that combines an SLSlike search inspired by probSAT with a systematic pruning of the SAT solution tree. The network of nodes implementing the complete algorithm is guaranteed to either find a solution or to generate an event indicating that the problem is unsatisfiable. One disadvantage of this complete algorithm is that it requires certain events to be globally routed to all nodes, thus degrading the efficiency of any parallel event routing scheme that seeks to exploit the local nature of internode communication. These global events are a reflection of the typically centralized and sequential nature of complete algorithms.
The network of nodes implementing the complete algorithm is not as efficient as the network implementing the probSAT algorithm when solving satisfiable KSAT instances. This is illustrated in Fig. 4a where the complete algorithm takes significantly more cycles to find a solution. Although not as efficient as the probSAT network, the network implementing the complete algorithm can detect that a problem is unsatisfiable as shown in Fig. 4b.
Graph colouring problems
A kcolouring for an undirected graph G with vertices V(G) and a set of edges E(G) is a map φ:V(G)→{1, 2, …, k}. In the graph colouring problem, the goal is to find a proper kcolouring φ_{0} of G where φ_{0}(x)≠φ_{0}(y) for all {x,y}∈E(G).
To solve a kcolouring problem, we map each vertex in the graph to a network node with k input ports and k output ports as shown in Fig. 5. Whenever the internal oscillator in a node/vertex generates an event, the node advertises its colour by generating an event on one of the k output ports. Events from a node/vertex are routed to all its neighbours in the graph. Each node maintains k counters that count how many of its neighbours have a particular colour. These counters are incremented when a node receives events from its neighbours. At an internal oscillator event, if the counter corresponding to the current node colour is nonzero (one of the neighbours has the same colour), the node chooses a different colour. If the internal Boolean variable, ‘heuristic’, is true, the node chooses the colour with the fewest conflicts (smallest neighbour count). If ‘heuristic’ is false, the node chooses the next colour in a fixed arbitrary ordering of colours. The node then resets the k counters, flips the ‘heuristic’ binary variable and generates an event to advertise its colour. A min conflict heuristic thus alternates with a heuristicfree scheme to update a conflicting node each cycle.
We assessed the performance of this algorithm on several kcolouring problems of intermediate difficulty (Table 1) taken from ref. 20 in which a different massively parallel colouring algorithm (gravitational swarm intelligence) was assessed. As in the previous section on Boolean satisfiability, we cannot attempt stateoftheartsized problems since the software simulation of a large network takes an infeasibly long time. In terms of average numbers of oscillation cycles to solution, the network compares favourably with gravitational swarm intelligence^{20}.
Prototype VLSI implementation
The prototype VLSI chip that implements a version of the architecture described in this paper is composed of a twodimensional array of binary nodes that communicate using events. The problem of transmission and routing of asynchronous events has been thoroughly investigated in the neuromorphic engineering literature^{21,22}. An elegant solution uses the addressevent representation (AER) protocol. When a node generates an event on one of its output ports, it executes a handshake protocol with the ‘output AER interface’. The ‘output AER interface’ encodes the address of the output port on which the event was generated and transmits the address offchip using an output bus that has log_{2}(K_{out}) lines. K_{out} is the number of possible event sources (the output ports of all the nodes). The array has K_{in} possible event targets (the input ports of the nodes), if an event is to be sent to one of these targets, the target address is sent to the ‘input AER interface’ on a bus that has log_{2}(K_{in}) lines. The ‘input AER interface’ decodes the address and sends an event to the target element by simultaneously activating the correct row and column in the array.
The twodimensional array on the chip comprises 64*32 binary nodes/variables, that is, nodes/variables with two output ports as shown in Fig. 6a. The chip can be configured so that two, three or four adjacent variables are merged together to realize four, six or eightvalued variables, respectively. An nvalued variable (n∈{2,4,6,8}) has n output ports and n possible internal states and has 2^{n}−1 input ports. Physically, a variable has n digital input lines on which it receives an nbit binary word encoding the index of the input port receiving the event. The details of the chip nodes are given in the methods section. An offchip event router implemented on a field programmable gate array communicates with the output and input AER interfaces to route events from nodes/variables output ports to input ports according to a programmable routing table as shown in Fig. 6b.
The analogue oscillator in each node is realized using an analogue integrate and fire neuron^{23} receiving constant current injection. In the methods section, we describe the frequency distribution of the onchip nodes and analyse internode phase coherence. The node oscillators have different frequencies but there is minor coupling between them which, however, does not amount to phase locking. The exploration of different phase relations, which is crucial to the search scheme employed by the architecture, thus remains intact.
Using the node logic on the prototype chip, we implemented a 3SAT algorithm, which is based on the algorithm from ref. 24. At the end of a cycle, an unfulfilled clause sends an event to flip the last variable in its domain to generate an event. Due to the continuously changing phase relations, an unfulfilled clause effectively chooses almost at random a variable to flip similar to the algorithm in ref. 24. Details of implementing this algorithm on the hardware prototype are given in the methods section. Figure 7a shows a histogram of the average number of oscillation cycles needed to find the solution of an example 3SAT problem.
The hardware prototype can solve graph colouring problems with up to eight colours. A kcolour node/vertex advertises its colour by generating an event on one of its k output ports at the end of its internal cycle (when its internal oscillator generates an event). This event is routed to all its neighbouring nodes/vertices in the graph, which causes these nodes to take a colour that is different from the advertised colour. Details of the graph colouring algorithm implemented on the prototype chip are given in the methods section. One difficult graph for this architecture is the ‘5 × 5 queen’ graph whose solution is equivalent to finding the noninterfering positions of 5 queens on a 5 × 5 chess board. The average number of cycles needed to find a solution is shown in Fig. 7b.
Discussion
CSPs have often been examined through the lens of statistical physics^{25,26}. Within the framework of statistical physics, a CSP is formulated as a distributed system that seeks to minimize the number of frustrated interactions (violated constraints) between its elements. Direct analogies can be established between the ground energy states of physical systems (where frustrated interactions are at a minimum) and solutions to CSPs^{27}. The architecture we describe in this paper is fundamentally different from the systems analysed in the framework of statistical physics, yet it captures some of the general features of such systems: the architecture makes use of a large number of locally interacting elements that mutually constrain each other so that the system as a whole tries to go to states where the number of frustrated interactions is at a minimum.
The most distinguishing feature of our system is the mechanism used to explore the solution space. In lieu of random fluctuations, the continuously changing phase relations between incommensurable oscillators are a source of nonrepeating fluctuations that can be easily exploited in our eventbased architecture to realize efficient search algorithms. Pseudorandom number generators (PRNGs) could have been used to drive the search. Even assuming we could implement an efficient PRNG with a pernode complexity equivalent to an analogue oscillator, PRNGs would require a clock. In each clock cycle, the PRNG scheme could choose one variable/constraint to update at random. This sequential scheme will fail to exploit the distributed and highly local nature of many CSPs (a constraint involves only few variables and a variable is part of only few constraints). This distributed and local nature is precisely what we are trying to exploit with the independent nodes running in parallel and trying to attain consistency in their local neighbourhoods in the constraint graph. The PRNG scheme could update multiple, randomly chosen, variables/constraints per clock cycle. Such batch updates would go against the idea of probSAT, and SLS algorithms in general, which make local moves that change only one variable at a time and propagate its new value before updating the next variable. Since nodes in a distributed architecture need to communicate, by having them communicate in an eventbased manner at times governed by simple local analogue oscillators, we obtain (almost for free) nonrepeating fluctuations that could be easily exploited to drive a stochasticlike search.
While the architecture is general enough to allow the instantiation of various algorithms for solving CSPs, it is best suited to implementing algorithms of the local search variety in which each variable is iteratively updated based only on local information, that is, based on the state of the constraints in which it is involved.
The digital eventbased nature of node communication is key to the architecture’s scalability and configurability. These digital pulses can be transmitted and routed using a digital fabric that links together a large number of nodes. In the prototype chip, event routing is done offchip in a serial manner on a field programmable gate array board^{28}. This introduces a serial bottleneck in the otherwise massively parallel operation of the architecture. This serial bottleneck must be eliminated to reap the advantages of the massively parallel operation of the distributed architecture. A distributed event routing architecture that exploits the local nature of event communication (which reflects the local nature of the constraint graphs of many relevant problems) to route events in parallel in multiple local domain is necessary. Configurable and parallel AER routing fabrics have been proposed for use in largescale neuromorphic systems^{29,30} and could be directly adapted for use in an implementation of the described architecture. As shown when solving SAT problems, the architecture is robust to event delays and lost events that relaxes the requirements on the event routing fabric.
In simulation, we showed for the case of random SAT problems that the proposed architecture can run at a surprisingly slow mean oscillation frequency (around 2 MHz) and still attain a time to solution that is comparable to a CPU running at three orders of magnitude higher clock rate. The simple logic operations in the constraint and literal nodes can certainly run at such slow frequencies. These results indicate that the proposed architecture is a more efficient approach to solving SAT problems than conventional CPUs.
Algorithms for solving CSPs are often conceived with the digital von Neumann model of computation in mind. The results presented in this paper highlight an alternative approach that starts with no prior assumptions about the computational model, and seeks to exploit the physical characteristics of the underlying substrate in order to find a solution tailored to the computational problem at hand. In our case, we exploited the natural incommensurability of physical analogue oscillators to derive a distributed novel algorithm for solving CSPs. The resulting physical algorithms naturally admit an efficient implementation on the physical substrate that underlies their derivation. The computing architectures developed using this bottomup approach, such as the VLSI device we present in this paper, have the potential to achieve considerable performance gains in their target problems compared with conventional purely digital approaches^{31}.
Methods
Description of the chip node
When an nvalued chip node (n∈{2,4,6,8}) receives an event on one of its 2^{n}−1 input ports, say port i, The 1s in the binary representation of i denote the allowable internal states that the variable can take. The node has n possible internal states and an event on one of the 2^{n}−1 input ports can thus decide which nonempty subset of these states are allowed. If multiple states are allowed, the variable stays at its current state if the current state is one of the allowed states, otherwise it goes to the lowest index allowed state. Let i(p) be the pth bit of i where indexing starts at 1, the state update function f is thus:
The node/variable generates an event only when it receives an event from the internal oscillator on port 0. The event is generated on the port corresponding to the currents state. The event routing function g is:
Frequency distribution of chip nodes and internode coherence
Figure 8a shows the frequency distribution of the 2,048 onchip nodes. Due to transistor mismatch, the different neuron circuits have different natural oscillation frequencies for the same current injection. Since the oscillation frequencies are real numbers drawn from a probability distribution arising from the variability inherent in the fabrication process, it is impossible for an oscillator to have a natural frequency that is a rational multiple of another’s. Coupling between the oscillators, however, could cause two oscillators with nearby frequencies to lock and effectively have the same frequency. To uncover possible synchronization phenomena among the oscillators, we use the mean phase coherence (MPC) measure, which for two waveforms with instantaneous phases φ_{1}(t) and φ_{2}(t) is defined as:
We turned on all 2,048 oscillators and assumed the phase changes linearly from 0 to 2π between successive events from an oscillator. We calculated the MPC for each pair of oscillators and the distribution of MPC values is shown in Fig. 8b (red line). In equation (5), we used a time discretization of 0.1 ms and an integration time of 18 s. We randomly generated 2,048 doubleprecision frequencies from a Gaussian distribution having the same mean and variance as the frequency distribution on the chip and generated an artificial constantfrequency waveform for each of these artificial frequencies. Using the same time discretization and total integration time in equation (5), we evaluated the distribution of MPC values for all possible pairs of the artificial constantfrequency waveforms. Ideally, all these MPC values would be zero, but as shown in Fig. 8b, due to discretization and finite integration time, these uncoupled artificial waveforms have nonzero MPC values. The oscillators on the chip are more synchronized compared with the ideal case (uncoupled oscillators) as evidenced by the heavier tail of their MPC distribution. Oscillator coupling is a potentially serious problem as it compromises the exploration of all possible phase relations, which is key to the exploration of the solution space. Most pairs of onchip oscillators, however, exhibit very low MPC values that are on par with the MPC values for uncoupled oscillators. Maximum MPC value was 0.34 so no phase locking was observed.
Mapping 3SAT problems to the hardware prototype
Each problem variable is represented by one binary node and each 3SAT constraint/clause is represented by a fourvalued node. An event from the 1 port or the 2 port of a binary variable/node denotes that the variable value is 0 or 1, respectively. The constraint/clause node is in state 4 if the constraint is fulfilled, otherwise its state (1 or 2 or 3) denotes which literal in the constraint last emitted an event. Consider a 3SAT constraint C1=(L1νL2ν−L3). Events from port 2 of variables/nodes L1 and L2 and events from port 1 of L3 should put the C1 node at state 4 (constraint fulfilled). A complementary event, that is, an event that does not cause the constraint to be fulfilled (for example, an event from port 2 of L3) should do nothing if the constraint is fulfilled as we assume one, or both, of the other two variables fulfil the constraint. However, if the constraint is not fulfilled, a complementary event from the kth variable in the constraint should put the constraint node in state k.
When the constraint node advertises its state by an event, events from ports 1, 2 or 3 should set the first, second or third variable, respectively, to a constraintfulfilling state. In the scheme described so far, when a constraint node is fulfilled (in state 4), events from the variables will never move it away from the fulfilled state. To address this, whenever the constraint node generates an event on port 4, this event is routed back to the constraint node and moves it to an arbitrary unfulfilled state (we arbitrarily choose state 3). Thus, within each oscillation cycle of the constraint node, the node has to receive a constraintfulfilling event from one of its variables to go to state 4 and not to generate an event at the end of the oscillation cycle that forces one of these variables to fulfil the constraint. The constraint nodes were picked from among the nodes with the lowest oscillation frequencies. The globally optimal solution is thus stable as the variable(s) fulfilling a constraint will always be able to generate at least one event that puts the constraint node in a fulfilled state during each cycle of the constraint node.
The above scheme is implemented by routing events according to Fig. 9a. Events from variable nodes cannot dislodge a constraint node from the fulfilled state or state 4. Note that state 4 of a constraint node is the lowest priority state according to the state update function in equation (2) so an input event to a constraint node that encodes that state 4 and state k (k∈{1,2,3}) are allowed will always put the constraint node in state k, if it was not already at state 4. If a variable appears with the same sign in multiple constraints (negated or nonnegated in all of them), an event generated by one of these constraint nodes that forces this common variable to go to a fulfilling state will automatically fulfil the other constraints as well so we route such events to the other constraints so as to move them to the fulfilled state as shown in Fig. 9a and prevent them from unnecessarily flipping other variables. This scheme for solving 3SAT is less powerful than the probSATbased approach described before. At the end of the cycle of an unfulfilled constraint node, the constraint node simply flips the last variable in its domain to generate an event. Due to the continuously shifting phase relations, the choice of which variable to flip is done almost at random with no regard for how many other constraints would be violated due to this flip.
Mapping graph colouring problems to the hardware prototype
To solve a graph colouring problem on the prototype chip, we can use a simple scheme where a graph vertex is represented by a chip node and events from port p of a node (which indicates that the node is in state/colour p) go to input port 2^{n}−1−2^{p−1} (all 1s binary string except at position p; p index starts from 1) of all adjacent nodes/vertices in the graph. We call these input ports the pexclude input ports as receiving an event on them instructs the node to go to any state except p, thereby enforcing the constraint. However, this scheme will not work since a node always goes to an allowed state that has the lowest index when responding to an exclude event (equation (3)). All the nodes would thus quickly get stuck in the 1 and 2 states as the oneexclude and twoexclude events that the nodes send to each other will not be able to move any node out of these 2 states. We use the more elaborate scheme shown in Fig. 9b where two fourvalued chip nodes are used to implement one fourvalued graph vertex. The value of this graph vertex is index of the last event emitted by the ‘main’ chip node.
Pairwise inequality constraints are implemented by routing events from the iexclude output port of one vertex to the iexclude input port of the other vertex. Assume a vertex has value 1, that is, the state of the main (helper) chip nodes are 1 (4). The state/colour of this vertex will only change if it receives an event on the oneexclude port. In that case, the ‘main’ and ‘helper’ chip nodes go to states 2 and 1, respectively, since these are the lowest index allowed states in the two chip nodes. The two chip nodes now have inconsistent states and whichever of them generates an event first forces the other node to switch its state; for example, if the ‘helper’ node generates an event first, it forces the ‘main’ node to take state 4. A oneexclude input event effectively has a 50% chance of moving this graph node to state 2 and a 50% chance to move it to state 4 due to the irregular phase relations.
The scheme can be extended to six and eightvalued vertices using three sixvalued and four eightvalued chip nodes, respectively, to represent a single graph vertex and it is straightforward to show that using this scheme, the network representing the colouring graph always uses all available colours. Three, five and sevencolouring problems can be implemented by adjusting the even colour schemes so that events are routed to input ports that exclude both the colour/index of the source output port, as well as the highest index/colour that will then be unused.
Additional information
How to cite this article: Mostafa, H. et al. An eventbased architecture for solving constraint satisfaction problems. Nat. Commun. 6:8941 doi: 10.1038/ncomms9941 (2015).
References
 1
MacKay, D. J. C. Information Theory, Inference and Learning Algorithms Cambridge Univ. Press (2003).
 2
Kirkpatrick, S., Gelatt, D. & Vecchi, M. P. Optimization by simulated annealing. Science 220, 671–680 (1983).
 3
Garey, M. R., Johnson, D. S. & Sethi, R. The complexity of flowshop and jobshop scheduling. Math. Oper. Res. 1, 117–129 (1976).
 4
Zhang, S. & Constantinides., A. G. Lagrange programming neural networks. IEEE Trans. Circuits Syst. II Analog Digit. Signal. 39, 441–452 (1992).
 5
Nagamatu, M. & Yanaru, T. On the stability of lagrange programming neural networks for satisfiability problems of prepositional calculus. Neurocomputing 13, 119–133 (1996).
 6
ErcseyRavasz, M. & Toroczkai, Z. Optimization hardness as transient chaos in an analog approach to constraint satisfaction. Nat. Phys. 7, 966–970 (2011).
 7
Minsky, M. L. & Papert., S. A. Perceptrons: An Introduction to Computational Geometry MIT Press (1969).
 8
Rumelhart, D. E. & McClelland, J. L. Parallel Distributed Processing: Explorations in the Microstructure of Cognition MIT Press (1986).
 9
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl Acad. Sci. USA 79, 2554–2558 (1982).
 10
Hopfield, J. J. & Tank, D. W. neural computation of decisions in optimization problems. Biol. Cybern. 52, 141–152 (1985).
 11
Hopfield, J. J. & Tank, D. W. Computing with neural circuits a model. Science 233, 625–633 (1986).
 12
Habenschuss, S., Jonke, Z. & Maass, W. Stochastic computations in cortical microcircuit models. PLoS Comput. Biol. 9, e1003311 (2013).
 13
Maass, W. Noise as a resource for computation and learning in networks of spiking neurons. Proc. IEEE 102, 860–880 (2014).
 14
Sipser., M. Introduction to the Theory of Computation International Thomson Publishing (1996).
 15
Balint, A. & Schöning, U. in Theory and Applications of Satisfiability Testing–SAT 2012 16–29Springer (2012).
 16
Belov, A., Diepol, D., Heule, M. & Järvisalo, M. Sat Competition 2014. Available at http://www.satcompetition.org/2014/ (2014).
 17
Montanari, A., RicciTersenghi, F. & Semerjian, G. Clusters of solutions and replica symmetry breaking in random ksatisfiability. J. Stat. Mech. Theory Exp. 2008, P04004 (2008).
 18
Sorensson, N. & Een, N. Minisat v1. 13a sat solver with conflictclause minimization. SAT 2005, 53 (2005).
 19
Braunstein, A., Mézard, M. & Zecchina, R. Survey propagation: An algorithm for satisfiability. Random Struct. Algorithms 27, 201–226 (2005).
 20
Ruiz, I. R. & Romay, M. G. in Nature Inspired Cooperative Strategies for Optimization (NICSO 2011) 159–168Springer (2011).
 21
Deiss, S. R., Douglas, R. J. & Whatley., A. M. in Pulsed Neural Networks eds Maass W., Bishop C. M. Ch. 6 157–178MIT Press (1998).
 22
Boahen, K. A. Pointtopoint connectivity between neuromorphic chips using addressevents. IEEE Trans. Circuits Syst. II Analog Digit. Signal 47, 416–434 (2000).
 23
Chicca, E., Stefanini, F., Bartolozzi, C. & Indiveri, G. Neuromorphic electronic circuits for building autonomous cognitive systems. Proc. IEEE 102, 1367–1388 (Sep 2014).
 24
Papadimitriou, C. H. in Proceedings 32nd Annual Symposium on Foundations of Computer Science 163–169IEEE (1991).
 25
Mézard, M., Parisi, G. & Zecchina, R. Analytic and algorithmic solution of random satisfiability problems. Science 297, 812–815 (2002).
 26
Krzakala, F. & Kurchan, J. Landscape analysis of constraint satisfaction problems. Phys. Rev. E 76, 021122 (2007).
 27
Barahona, F. On the computational complexity of ising spin glass models. J. Phys. A Math. Gen. 15, 3241 (1982).
 28
Fasnacht, D. B. & Indiveri, G. in Conference on Information Sciences and Systems, CISS 2011 1–6Johns Hopkins University (2011).
 29
Joshi, S., Deiss, S., Arnold, M., Yu, T. & Cauwenberghs, G. in Cellular Nanoscale Networks and Their Applications (CNNA), 2010 12th International Workshop on 1–6IEEE (2010).
 30
Merolla, P., Arthur, J., Alvarez, R., Bussat, J.M. & Boahen, K. A multicast tree router for multichip neuromorphic systems. IEEE Trans. Circuits Syst. I Regul. Pap. 61, 820–833 (2014).
 31
Indiveri, G. & Liu, S. C. Memory and information processing in neuromorphic systems. Proc. IEEE 103, 1379–1397 (2015).
Acknowledgements
This work was supported by the European CHISTERA program, via the ‘Plasticity in NEUral Memristive Architectures’ (PNEUMA) project and by the European Research council, via the ‘Neuromorphic Processors’ (neuroP) project, under ERC grant number 257219.
Author information
Affiliations
Contributions
The hardware architecture was developed by H.M. with minor contributions from L.K.M. and G.I.; the probSAT network by L.K.M.; the hybridization with DPLL by H.M.; the kcolouring solver equally by H.M. and L.K.M.; the hardware prototype by H.M. Behavioural simulators were written and simulations were run by L.K.M. and H.M. Experimental chip measurements were obtained by H.M. The paper was written by H.M., L.K.M. and G.I.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
Supplementary Figure 1, Supplementary Note 1 and Supplementary References (PDF 155 kb)
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Mostafa, H., Müller, L. & Indiveri, G. An eventbased architecture for solving constraint satisfaction problems. Nat Commun 6, 8941 (2015). https://doi.org/10.1038/ncomms9941
Received:
Accepted:
Published:
Further reading

Perspective on photonic memristive neuromorphic computing
PhotoniX (2020)

Using synchronized oscillators to compute the maximum independent set
Nature Communications (2020)

Memory devices and applications for inmemory computing
Nature Nanotechnology (2020)

A Spiking Neural Network Model of Depth from Defocus for Eventbased Neuromorphic Vision
Scientific Reports (2019)

A continuoustime MaxSAT solver with high analog performance
Nature Communications (2018)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.