dense about
bit
  
  Can we automate the generation of algorithms? Can we devise a technique to discover new ones? I intend to define a search space to find an algorithm, verify its presence, and find it, as opposed to analysing known ones. My aim primarily is to enable an experience of abstract reasoning and the joy of algorithm discovery. Before plunging into the machinery that enables us to discover an algorithm, let us meditate for a moment on the universality of inventiveness: the general process of thought science seems to hold irrespective of one's domain of specialization. This post is the first of many under the series of composing a technique to automatically discover new algorithms.

  Say you got a chance to closely but unobtrusively observe your favourite brilliant scientist. As he goes about finding an elegant solution for a new problem, a lot of exploration happens. They study the literature, by looking at existing techniques and numerous variations, from their field and/or any other relevant fields. They concoct new combinations of known techniques or variations, failing at many and succeeding at few, and starting exploration on those pieces all over again.

  The design of algorithms began as art. And even today, most people learn only by watching others perform. Such learning or exploration more often than not involves learning new techniques, making corrections, and conjectures what techniques could be sub-steps in the desired overall solution. A lot of thought exercises and trial-and-error experimentation goes into it. They would be helped in this exploration by an unwavering focus on the goal, an uncommonly precious common sense, incisive thought and insight, his energy, intensity, and uncompromising zeal to find the best possible (optimal) solution.

  There seems to be something universal about advanced scientific problem solving by intelligent humans, even though their needs, interests, and training may be worlds apart. Fascinated by this observation, I asked myself this question— Can we build an AI system to mimic this complex process, autonomously, and discover new solutions that are useful and intelligent, if not downright brilliant?

If we can indeed build such a capable intelligent system, it would beat the best human scientists in at least three ways:
  • Area Agnostic -- the same system can solve complex problems from literally any field;
  • Blazingly Brisk -- its process of scientific discovery is likely to be orders of magnitude faster than that observed in brilliant humans; and
  • Cognitively Consistent -- its power to ponder and discover would be constantly on and full, unaffected by moods, emotions, fears of failure, etc.


The Melting Pot


  A brilliant scientist is good at understanding the field better. He breaks the complexities of the subject into various simplexities. And if he can recall all of those disassociated simplexities, he gets an understanding of the big picture. This process requires the aspect of understanding.

  Consider for example, the number 8. A scientist can count upto eight, knows what eight is, compute on it and infer its relationships with other observations— but he cannot comprehend all of this at once. This sets the theoretical limit for our scientist. Our system can surpass thus limit by collective comprehension abilities and datum-analysis separation. The goal is to augment to the science of non- understanding as opposed to understanding, by not necessarily understanding the complexities.

  The system need not understand the domain, need not extract features from data, and may not even need to look at the data. It does not simply do an efficient but blind search through an enormous hypothesis space. In short, it does not look in the space of all possible explanation for the observed data, like, say, a deep learning network would.

  The system is guaranteed to be more efficient than today’s deep learning methods and rapidly searches through a much smaller, much better organised, space of plausible new algorithms that custom-fit the given data. These combinatorial algorithms (counting, permuting, etc.) can then be applied to the data to find patterns, to build a model, etc. Until this last step, it is possible that it would not even need to see the data. Its domain of play is the melting pot of the collective and connected thoughts of dozens or perhaps hundreds of human scientists. It is in this melting pot that techniques keep taking birth, evolving, morphing, combining and disintegrating.

  I do not intend to propose a fully contained self-referential system. I am mindful of the impossibility of reducing the I/O processing of human mind to a complex algorithm. This forms a theoretical limit on the nature of discoverable algorithms. That is to say, the system can never come up with an algorithm on its own that can discover fundamental truths about algorithms.

  Nothing can be decided about computability of my technique and the algorithms discoverable by it. Same can be said about automatic reasoning of algorithms.


Representation


  An abstract entity performing a transformational process in the computing world, is called an algorithm. Its precise representation in a computer readable formalism is called a program, which can be in any programming language.

The problem that an algorithm is set to solve is specified by two components:
  • A precise definition of set of legal input
  • A precise characterisation of desired output as function of input

  We regard input and output sets as data and an algorithm as a set of instructions operating on them. Many real-world algorithmic problems are not easy to define. Consider for example the problem of weather prediction, evaluation of stock market investments, etc., requiring hard to specify inputs or outputs.

  Technical studies of algorithms are mostly concerned with verification (proof of correctness) of a given algorithm or with examining its characteristics (computational complexities) as a computational tool. Such mathematical studies tell us little about how the algorithm was developed in the first place. Let us consider a small example involving sorting and searching algorithms. Take the bubble sort algorithm proposed by Donald Knuth in his famous book The Art of Computer Programming.

  This algorithm operating on n items has a running time proportional to \(n^{2}\) , which grows very fast in practice. The excitement around algorithms grew when people began to design faster sorting algorithms like quick sort, shell sort, merge sort, radix sort, heap sort, etc. The programs implementing these algorithms were supplemented by informal proofs of correctness by mathematical induction based on properties of natural numbers. More clever methods were developed like operating on the pointers to data, instead of moving data themselves. These involved various combinatoric techniques.

  Such techniques emerged from the need to make programs run faster. We aim to tackle a few combinatorial problems from a different perspective, by building algorithms from first principles using techniques from algebraic topology, generalisation theory, formulaic manipulation and reasoning about function value changes.

  In a wide-ranging treatment unusually attuned to the competing ideas of algorithmic science, responsiveness of mathematical exploration to interesting facts about algorithms is shown. I concern myself with developing algorithms from first principles, departing from standard approach of analysis and application of known ones.


Algorithmic Chemistry


  Walter Fontana, in a 1991 paper, reported a novel approach to evolution of emergent phenomena. The effort failed to produce complex structures of any particular interest, albeit being able to produce simple self- organising phenomena. The approach was to let small LISP algorithms act on each other to produce new LISP algorithms. These are again added back to the primordial LISP algorithm melting pot. It was similar to a chemical system.

  The simulations were carried at a large scale over long durations, yet they failed to yield anything dramatic. Many variants of the style followed suit at the time. These involved various biological, physical and chemical inspired stochastic replacement rules to prune uninteresting algorithms. None succeeded in producing complex structures. To achieve a variety of layered and complex algorithm networks, AI tools like Tensor factorisation, hyper graph mining and probabilistic logic can be jointed to Fontana’s approach.

  Each variety of melting pot would hold a lattice of self-organising program networks carrying desired functionality. Each set of algorithm networks collectively transform inputs to outputs as desired by an AI goal system. This can be seen as a type of probabilistic model-building genetic algorithm acting on an algorithm chemical network. The rich combinatorial potential of these networks either lend to complex emergent phenomenon or a complete non-inferential digital mess.

  Factors like stability of reaction networks matter as much as the programming language underlying the algorithms (right from machine language to assembly language to high level language). The pots can be compartmentalised for parallel processing. Each LISP algorithm (with its constituent expressions and datum) is seen as a molecule (atom) with associated mass and novelty score.

  Novelty score of a molecule (atom) is the distance between data flow paths within itself and a standard known algorithmic technique for a given problem. For example, consider a set of integers from 2 to N and each element of this set as a molecule in a melting pot. For a reaction rule where every molecule kills a multiple of itself, we end up with all primes between 2 and N. The novelty of this phenomenon is compared with other discovered/known techniques in terms of data flow pathways to obtain a novelty score. This score of a network is assigned to its constituent algorithms. Evolution of novelty is modelled using a sparse vector with an entry for each constituent algorithm. Useful interaction information of a series of these vectors is obtained using information theoretic measures.

  Learning of this evolution leads us to the duality of searchability and decidability — is the desired algorithm present in the melting pot (decidability); and find the desired algorithm in the melting pot. With SMT-based search methods, both questions become essentially the same.

  Given an oracle that answers decidability questions, efficient search for desired algorithm can be carried using SMT-solvers. With concurrent dynamics in our melting pots, we ask another related question: can the oracle be queried concurrently? If the algorithm is present in the pot, we can randomly reduce the SMT-based search to several satisfiability formulas, one of which is a probable solution.

  Using hardness assumptions, randomness is eliminated and one of the SMT formulas is likely to have a solution that is also a solution to the original problem. Let us briefly look at the aspect of introduction, removal and movement of algorithms within the compartments of pots. The usefulness and coherence of algorithms are highly contextual. For example, algorithm A acts as a catalyst in a reaction, where B and C react to make D. D then reacts with B to make more C and B.

  A direct mechanism can cause a cyclic reaction of multiplicities greater than two. It can simply result in a chain reaction too. With the scale of combinatorics involved in these reaction networks, it is entirely not clear on how to snip a pool of algorithms — with their ever growing overlapping and/or disjoint networks. One fundamental aspect missing from these reaction networks is structure. In other words, the melting pot is well stirred. The molecules are always populations and large numbers are needed for predictability. My primary intent is to draw attention to atoms and molecules and ascribe a mathematical structure to them. The idea is atoms of algorithms — data and instructions combine into molecules of algorithms — a control structure. We call molecules as configurations and atoms as generators. Representative classes of such generators form standard bases. Each standard base denotes the instructional and data set for a desired algorithm.


Intuition


  I provide a motivation for preparing a way of building a specialised language in which an algorithm is anticipated to be easily expressible. I then offer a formulation of a universal program, in that language, that can build itself. We resort to tools offered by algebraic topology to search for a component of the universal program that solves a given problem. We can then transform it into computational program from its algebraic form. This approach is jointed by an abstract notion of the geometry of algorithms. I will expand on this later in the series, in another blog post.

To make algorithms quickly and directly, instead of by proxy, we depart from the approach of inventing ways to construct unobservable configurations using algorithmic chemical reactions. Using algebraic topology, we look at the aspects of:
  • How are instructions and data connected into an algorithm?
  • What can happen to such connections?

  I then build a framework for algorithm discovery by answering both aspects of inquiry. This will show how general algorithm development fits into algebraic topological scheme of things, deriving some algorithms in the process. The decidability aspect of algorithm existence once established, the searchability is reduced to an optimisation problem. Depending upon its formulation, be it cost function, energy or metric minimisation, we draw correspondence to topology theory. The idea is to form an analogous graded mapping where consecutive maps annihilate each other — i.e., the boundary of a simplex (n-dimensional triangle) does not itself have a boundary. Thus, searchability is looked as equivalent to mapping something to zero. This something maybe the boundary of a boundary or a vector space (any metric on a manifold).

  Consider a collection of algorithms in the melting pot. Each algorithm in turn is sub- divided into a collection of sub-algorithms. The properties of sub-algorithms can be inferred based on the incidence (connectedness) relations among them. Each sub algorithm is considered irreducible and called an instruction. The whole algorithm is then represented by a collection of instructions and input data.

  Think of an algorithm A. Each instruction in A has many properties (sub functionality, complexity, operators, order of its operands, etc.). To bring about a notion of equality among instructions, we say each instruction depends on and produces other instructions, which are said to be connected/incident to it. These collections of instructions that are connected to one instruction form a boundary of the said instruction.

  In short, we find all instructions are equal in a sense that they all have incidence relations among them. These incidences define a boundary for each instruction. The idea is further extended to the space of all algorithms with incidences and boundaries described among them. It is important to note that the input/output elements, unlike instructions, of an algorithm have no elements incident to them. This means the boundary for data elements is absent.

  Consider the collection of algorithms in the melting pot. Assuming we have a notion of a geometry of an algorithm, that is, say we can in a topographical sense, map the shape of an algorithm. We can define an equivalence relation on the collection of algorithms based on this geometry property. That is, two algorithms are called ‘equal’ if they yield the same geometry or shape, according to a given definition.

  Let us assume this relation partitions the collection of algorithms into four sub- collections. We call the collection of four sub-collections a quotient collection. This notion of quotient collection and equivalence relation can be reduced to within a collection of instructions. Let us take a short detour to define a few concepts to help us investigate connectedness of data and instructions.


Algebraic Topology


  Let me provide a minimal list of definitions and results from abstract algebra that may prove useful in our further discussion of algorithm discovery. These concepts helped me in organising my thoughts better with regard to investigating aspects of algorithms.

  A topology in a collection \(A\) is a family \(T\) of sub-collections such that their union and intersection also belong to \(T\). We call a collection topological space whenever it has some topology defined on it. The above topological space is denoted as \((A, T)\).

  A collection of elements with well-defined associative binary operation (like addition, multiplication, etc.) with identity and inverses is called a group. It is called an additive group when it is also commutative. The torsion part of an additive group \(G\) is a sub- group of its elements \(g\) in \(G\) such that \( g^{n} = 1 \) for some natural number \(n\).

  A group of particular interest is \( Z \), the group of integers. The addition or subtraction of two integers gets us another integer. We say that \( Z \) is an additive group. In certain situations, we regard two integers which differ by a fixed prime number p to be equal. We write \( Z_{p}\) for the collection of such integers. The addition and subtraction operations are carried in accordance to \( Z_{p} \) . We call \( Z_{p} \) a cyclic group, with order p. A famous theorem says that a finite additive group is direct sum of finitely many copies of the cyclic group \(Z\) and groups \( Z_{p} \) . Elements with possible repeated application on themselves and each other when produce all elements in the group, are called generators. Cyclic groups can be generated as powers of a single generator.

  I sometimes forget all the mathematical definitions in details. Many have been dealt with better elsewhere. I just write them here just to remind myself the ideas behind them. For the sake of gaining a better perspective, here is a categorical representation of entities


 
  A collection is conventionally called a set. And the following definitions are written in terms of additive groups. For multiplicative groups, group addition must be replaced by multiplication, inversion by division and unity by one.

  A semigroup is a set \(G\) together with an associative binary operation:

$$ G \times G \longrightarrow G $$
  That is, for any \(x\), \(y\), \(z\) belonging to \(G\),

$$ (x+y)+z = x+(y+z) \longrightarrow G $$
  A monoid is a semigroup with an identity \(e\), such that for any x belonging to \(G\),

$$ x+e = e+x = x $$
  A group is a monoid such that for any x belonging to \(G\), there exists an inverse ‘-x’ such that:

$$ x+(-x) = (-x)+x = e $$
  An additive group, also called an abelian group, is a commutative group. That is, for \(x\), \(y\) belonging to group \(G\),

$$ x+y = y+x $$
  The set of integers under addition is an abelian group. The set of real numbers under addition and multiplication is also an abelian group. A subgroup is a sub collection preserving the group operation. A ring is a collection together with two identities \((1, 0)\) corresponding to two binary operations \((*, +)\), such that the collection is an additive group under addition and a monoid under multiplication and are bound by law of distributivity:
$$ x(y+z) = xy+xz $$


  For example, the set of real numbers under addition and multiplication is a ring. An integral domain is a ring which is commutative under multiplication with no zero divisors. That is, no part \(x\), \(y\) exists such that \(xy = 0\), where \(0\) is the identity element under addition.

  A field is an integral domain such that every element has a multiplicative inverse, except zero. A [left] vector space \(X\) over a field \(K\) is a collection \(X\) and a scalar multiplication map such that:

$$ K \times X \longrightarrow X :(a,x) \longrightarrow ax $$
   is bounded by usual laws of distributivity, etc. An R-module is generalisation of a K- vector space where the field is replaced by ring. An R-module is finitely generated if and only if there exists a finite set of generators or basis, and is called a free R-module if this basis is unique. Hence, a free R-module is same as a vector space. Collections, groups, rings and modules, etc. are known collectively as categories.

  Consider two groups \(G\) and \(H\), with operations \( + \) and \( * \) respectively. A continuous mapping from \(G\) to \(H\) is called a homomorphism if upon interchanging \( * \) and \( + \), we still retain their aspects we care about. For example, let R be group of real numbers with operation ‘addition’. Let us denote it by \( (R, +) \) Another group of real numbers with operation ‘multiplication’ be denoted by \( (R, *) \). A mapping \( \varphi \) defined as \( \varphi = e^{x} \), is a homomorphism from \( (R, +) \) to \( (R, *) \). For, it can be verified that for any two real numbers \(a\) and \(b\):

$$ \varphi(a+b) = \varphi(a) \times \varphi(b) $$
  The same can be extended to topological spaces. Think of letters \(W, Z, X, I, N\) and \(M\) as topological spaces. Consider three mappings between them:

$$ \varphi: W \longrightarrow Z$$
$$ \psi: N \longrightarrow M$$
$$ \phi: X \longrightarrow I$$
  While the first two maps are homomorphisms, the third map is not a homomorphism. Intuitively, we see it as: bending and twisting is fine; cutting is not.

For a homomorphism \(\varphi\) that maps group G to group H, we use the following concepts:
  • Kernel of \( \varphi \) is the collection of elements from G that map to 0
  • Image of \( \varphi \) is the collection of elements from H that are mapped upon from G by \( \varphi \)

  The homomorphism \( \varphi \) is a monomorphism if its kernel is empty; is an epimorphism if its image is H; is an isomorphism if φ is both an epimorphism and monomorphism.

  Let \(H\) be a subgroup of \(G\). The quotient group \(G\) \ \(H\) (\(G\) modulo \(H\)) is the group that results when we set every element in \(H\) equal to identity while still preserving the group structure from \(G\) in some sense. Let’s say \(x\) and \(y\) be two elements of \(G\), and \(h\) is an element from \(H\). And we have that \( x-y=h \) in \(G\). In the quotient group G\H, we identify h with identity element. So, in a sense, \( x-y=0 \) or \( x=y \). Hence for \(G\) \ \(H\) to make sense as a group, we identify \(x\) and \(y\) together. This identification process forms the cosets that are the elements in \(G\) \ \(H\). Each element in \(G\) \ \(H\) is an equivalence class of elements of \(G\) that all differ by an element of \(H\).

  As an example, the quotient group of integers by group of even numbers under addition has two cosets: {odd integers, even integers}. Considering a group \(G = {R^{3}, +}\) and its subgroup \(H = \{R, +\} = \{h, 0, 0\}\),  \(G\) \ \(H\) consists of all parallel lines to x-axis (each line is a coset), which has two degrees of freedom: \( R^{3} \) \ \(R= R^2 \) If we were to have \(H = \{R^{2}, +\}\),  \(G\) \ \(H\) consists of all perpendicular planes to z-axis. The quotient group is a disjoint partition G where elements of \(G\) \ \(H\) are copies of \(H\).

  Since we can impose the concept of continuity on topological spaces, we regard all topological spaces homomorphic to each other as identical. Equivalence relations on topological spaces are called quotient spaces. For an n-dimensional topological space, its interior is the space with its boundary removed. We denote boundary by \( \delta \). For example, the boundary of a 3-dimensional solid ball is a 2-dimensional sphere. Similarly, the boundary of a 4-dimensional solid ball is a 3-dimensional sphere. Generally speaking, for an n-dimensional ball \( D^{n} \) and an n-dimensional sphere \( S^{n} \), we have:

$$ \delta(D^{n}) = S^{n-1} $$
  Two topological spaces are glued together to make attached space using gluing map. For example, let \(A\) and \(B\) be two disjoint topological spaces. That is to say, intersection of \(A\) and \(B\) is empty. Let \(X\) and \(Y\) be sub-spaces of \(A\) and \(B\) respectively. In a situation where there is a homomorphism \( \varphi : X \longrightarrow Y \), we get attached space of union of \(A\) and \(B\), using gluing map \( \varphi \).

  For example, let \(A\) be a 1-point topological space pnt. Considering:

$$ A = X = pnt $$
$$ B = D^{2} $$
$$ Y = \delta(D^{2}) = S^{1} $$
  We have the gluing map \( \varphi : Y \longrightarrow X \), collapsing Y to point pnt. We observe that the resulting attached space is homomorphic to a 2-dimensional sphere \( S^{2} \).

  Let us call an n-dimensional solid ball an n-closed-cell and the interior of an n- dimensional ball an n-open-cell. We attach together finite number of closed cells of various dimensions to obtain a topological space using a gluing map. The resulting space is called a cell complex. In short, a cell complex is a union of cells (without boundary).

  For example, the n-dimensional sphere Sn is a cell complex with a 0- cell and a closed n-cell glued together by map \( \varphi: \delta(n-cell) \longrightarrow 0 - cell\). If we attach a closed n-cell to (n-1)-dimensional sphere using this gluing map we get an n-dimensional solid ball. Network terminology equivalent to a few topological names are showcased in table below:

  
Topological Name
Network Name
0-cell
Point
0-cycle
Pair of points
Basis of 0-dim. non cycles
Independent points
1-cell
Branch
1-cycle or boundary
Loop
2-cell
Mesh
Basis of 2-dim. non cycles
Independent meshes


  Notions of n-dimensional round holes and n-dimensional rooms (cavities) of cell complexes can be described by their \( n^{th} \) homotopy and \( n^{th} \) homology groups. Imagine standing on an island and throwing a lasso flat on the ground. If there is pool of water (hole) in the island, the lasso will smoothly converge to a point as you start shrinking it. Assuming there is a pool of water on the island (and you are outside the pool), you cannot shrink the lasso to a point without getting it wet.

  The fundamental group (1st homotopy group) of the island measures the degree of possibility and shrinking the lasso to a point. The \( 2^{nd} \) homotopy group of the island measures the degree to which one can shrink a large piece of cloth to a point while always keeping the cloth on the island.

  Homology attributes a family of homology groups to a topological space. Homology theories are not unique. Any additive group can be the 0th homological group of a point space pnt. For an arbitrary additive group there exists a homology. For the additive group of integers \( Z \), its homology group is a direct sum of a finite copies of \( Z \) and \( Z_{p} \), for some prime number p. One basic property of the homology groups of an n- dimensional cell complex is they vanish on dimensions higher than n and lower than \(0\).

  Let X be an n-dimensional cell complex. We denote by \(k_{p}\) the number of p-skeletons of X. Its p-chain group is the direct sum of \(k_{q}\) copies of additive integer group \(Z\). A chain complex of X is a long sequence of its chain groups and homomorphisms (\(\delta_{p}\)). The homomorphisms are also called boundary operators. A basic property of chain complexes is that the kernel of \( p^{th} \) boundary operator is a part of the image of \( (p+1)^{th} \) boundary operator. In other words, consecutive pairs of homomorphisms annihilate each other: \( \delta_{p-1}\delta_{p} = 0 \). With a slight abuse of notation, it is expressed as: \( \delta^{2} = 0 \).

  The homology of a cell complex is same as the homology of its chain complex. The homology of a chain complex is a direct sum of its \( p^{th} \) homology groups. The \( p^{th} \) homology group of a chain complex is the quotient group of subgroup of p-cycles by a subgroup of p-boundaries. The p-cycles are pth chains which are mapped to 0 by the boundary operator. The p-boundaries are the image of the \( (p+1)^{th} \) chains of the boundary operator.

  Each generator of a p-dimensional homology group describes a p-dimensional “hole” in the topological space. In lower dimensions, these holes can be described intuitively as connected components (dimension 0), tunnels (dimension 1), and voids (dimension 2). A higher dimensional hole may be considered a part of a topological space where a p-dimensional sphere can be attached.

  The algebraic rank of a \( p^{th} \) homology group is known as the \( p^{th} \) Betti number \( b_{p}\). Betti numbers are commonly used to distinguish different topological spaces (chain complexes) from each other. For example, a 2-sphere has a single connected component, no tunnels, and encloses a void in 3-dimensions. Its Betti numbers are thus \( b_{0}=1, b_{1}=0, b_{2}=1\) (the remaining \(b_{q}\) are zero). In contrast, a doughnut has a single connected component, two loops (namely, one around its centre, the other around its hollow tube), and encloses a void in 3- dimensions. Its Betti numbers thus are \( b_{0}=1, b_{1}=2, b_{2}=1\)(again, the remaining \(b_{q}\) are zero). Betti numbers can thus discern a sphere and a doughnut from each other without requiring any geometrical information about the space. This signature property of Betti numbers also applies to higher dimensions, where distinguishing different spaces from each other by purely geometrical means might not be possible.

  Turning our attention to the problem of computation of homology groups of cell complexes. The homology groups of a space characterise the number and type of holes in that space and therefore give a fundamental description of its structure. This type of information is used, for example, in understanding the structure of attractors from embedded time series data, or for determining similarities between proteins in molecular biology. The algebra involved in such computation depends on the nature of the coefficient group. And we will confine our discussion to the group of integers \(Z\).

  Suppose \(k\) is a cell complex. Let a group \( G_{p}(K) \) be generated by generators \( c_{1}^{p}, c_{2}^{p}, c_{3}^{p}, ..., c_{k}^{p}\). Making a linear change of basis for each \( G_{p}(K) \), we can represent the action of the boundary operator in terms of its generators. The boundary of any generator is a (p-1) chain. That is, for any generator \(c_{i}^{p}\):

$$ \delta(c_{i}^{p}) = \sum a_{ij}^{p} c_{j}^{p-1} $$
  The integers \(a_{ij}^{p}\) form a matrix \(A^{p}\), called the p-dimensional incidence matrix. The property \(\delta^{2} = 0\) means that:

$$ 0 = \delta^2 c_{i}^{p}$$
$$ 0 = \delta \sum a_{ij}^{p}c_{j}^{p-1}$$
$$ 0 = \sum a_{ij}^{p}a_{jh}^{p-1}c_{h}^{p-2}$$
  In matrix notation, this can be expressed as:

$$ A^{p}A^{p-1} = 0 $$
  By suitable choice of generators of the groups \( G_{p}(K) \), all incidence matrices can be converted into diagonal matrices. This follows from the linear algebraic property that, for an integer matrix \(A\), there are matrices \(M\) and \(N\), such that the matrix \(MAN\) is of the form:

$$ \begin{bmatrix}... & & 0 & & ...& 0\\1 & & & & & 0 \\ & d_{1} & & & & 0 \\& & d_{2} & & & 0 \\& & & ... & & ...\\& & & & d_{h} & \end{bmatrix}$$
  The sub-matrices in the top-left, top-right and bottom-right and all unmarked entire are zeros. An additional property of this matrix is that each \( d_{i} \) divides \( d_{i+1} \) We transform each \(A^p\) to this form in turn.

   Starting with \( A^{1} \), we find matrices \( M^{1}\) and \( N^{1} \) such that:

$$ D^{1} = M^{1}A^{1}N^{1} = \begin{bmatrix}... & & 0 & & ...& 0\\ * & & & & & 0 \\ & * & & & & 0 \\& & * & & & 0 \\& & & ... & & ...\\& & & & * & \end{bmatrix}$$
  The asterisks denoting its nonzero part. Let \( c^{0} \) and \( c^{1} \) be bases of first and second chain groups \( G_{0}(K) \) and \( G_{1}(K) \), respectively:

$$ c^{0} = \begin{bmatrix} c_{1}^{0}\\ c_{2}^{0}\\ ...\\ c_{p}^{0}\end{bmatrix}, c^{1} = \begin{bmatrix} c_{1}^{1}\\ c_{2}^{1}\\ ...\\ c_{q}^{1}\end{bmatrix}$$
  Replacing \( c^{0} \) by \( (N^{1})^{-1}c^{0} \) and \( c^{1} \) by \(M^{1}c^{1}\), the boundary relations become:

$$ \delta(M^{1}c^{1}) = M^{1}A^{1}c^{0} = D^{1}(K^{1})^{-1}c^{0} $$
  The new one-dimensional incidence matrix is now \( D^{1} \). Inductively, it follows that for each p, we find a new basis \(g^{p}\), such that the boundary relations become:

$$ \delta g^{p} = D^{p}g^{p-1} $$
   where, \( g^{p} = \begin{bmatrix} g_{1}^{p}\\ g_{2}^{p}\\ ...\\ g_{k}^{p}\end{bmatrix} \) and

$$ D^{p} = \begin{bmatrix}... & & 0 & & ...& 0\\ 1 & & & & & 0 \\ & 1 & & & & 0 \\& & t^{p} & & & 0 \\& & & ... & & ...\\& & & & t^{p} & \end{bmatrix}$$
  This makes it convenient for us to split the column matrix \( g^{p} \) into a number of sub- columns, for each p:

$$ g^{p} = \begin{bmatrix} \alpha^{p}\\ \beta^{p}\\ \gamma^{p} \\ \rho^{p} \\ \varepsilon^{p} \end{bmatrix} $$
  The combined number of elements in \(\alpha^{p}\), \(\beta^{p}\) and \(\gamma^{p}\) is the number of zero rows in \(D^{p}\). The number of elements in \(\rho^{p}\) is the number of units in \(D^{p}\), and is equal to the number of elements in \(\alpha^{p-1}\). The number of elements in \(\varepsilon^{p}\) is the number of non-zero non-unity elements in \(D^{p}\). This is equal to number of \(t_{i}^{p}\) in \(D^{p}\). It follows from the fact \(D^{p}D^{p-1} = 0\), that the total number of elements in \(\rho^{p}\) and \(\varepsilon^{p}\) do not exceed the number of elements in \(\alpha^{p-1}\), \(\beta^{p-1}\) and \(\gamma^{p-1}\). These boundary relations are summarised as:

$$ \delta(\alpha^{p}) = \delta(\beta^{p}) = \delta(\gamma^{p}) = \alpha^{p-1} $$
$$ \delta(\varepsilon^{p}) = t_{i}^{p} \beta_{i}^{p-1} $$
  Bases of the group \(G_p(K)\) with these properties are called standard bases or canonical bases. Let us recall the definition of \(p^{th}\)homology group of a complex. It is the quotient group of p-cycles by p-boundaries. From the boundary relations above, we see the p-cycles group is generated by \((\alpha^{p})\), \((\beta^{p})\), and \((\gamma^{p})\). All \(\alpha_{i}^{p}\) are boundaries, while for each i, \(t_{i}^{p+1} \beta_{i}^{p}\) is a boundary. No linear combination of \(\gamma_{i}^{p}\) is a boundary. Following this, \(p^{th}\) homology group is isomorphic to a direct sum of \(B_{p}\) and \(T_{p}\). \(B_{p}\) is an additive group generated by \(\gamma_{i}^{p}\) and \(T_{p}\) is a direct sum of cyclic groups with orders \(t_{1}^{p+1}, t_{2}^{p+1}, ..., t_{k}^{p+1}\)

  The number of \(\gamma_{i}^{p}\) in the canonical basis is called the \(p^{th}\) betti number \(b_{p}\) of K. The order of \(T_{p}\) is called the \(p^{th}\) torsion coefficient of K. Let \(n_{p}\) be the number of rows of incidence matrix \(A_{p}\). This is equal to the number of generators of \(G_{p}(K)\). The \(p^{th}\) betti number is also given by:

$$ b_{p} = n_{p} - rank\ of \ A^{p} - rank\ of\ A^{p+1} $$
  A popular theorem states:

$$ \sum - 1^{p}b_{p} = \sum -1^{p}n_{p} $$
  where, the LHS is termed as Euler characteristic. For a chain complex with groups and homomorphisms,

$$ 0\ \overset{\delta}{\longrightarrow} \ ...\ \overset{\delta}{\longrightarrow} K_{p} \ \overset{\delta}{\longrightarrow} ... \ \overset{\delta}{\longrightarrow} K_{1} \ \overset{\delta}{\longrightarrow} K_{0} \ \overset{\delta}{\longrightarrow}0 $$
  The \(p^{th}\) homology group \(H^{p}\) is denoted by quotient of kernels group \(Z^{p}\) by boundary group \(B^{p}\):

$$ H^{p} = \frac{Z^{p}}{B^{p}}$$
  Let me now try to mathematically formulate the algorithm synthesis problem.


Theory of Algorithm Synthesis


  Various programs are categorised corresponding to the algorithms they implement. Similarly, various algorithms are categorised corresponding to the computable functions they implement. The collection of computable functions forms a quotient of collection of algorithms, which in turn form a quotient of programs. Viewing a program as an enumeration of instructions and data, we adopt the notion of ‘algorithm is data’.

  Algorithms, in algebraic topological terms, are cell complexes. The data and computational steps of an algorithm have incidence relations to them. Computational steps, or instructions, depend on (incident to) and produce (incident on) other computational steps. Data, on the other hand, have no entities incident to them. Regarding these incidence relations as boundary relations, this cell complex produces a chain complex.

  A representative unit of a sub-structure of the chain complex is a program; the collection of allowable data (input) and instruction elements forms a programming language. Just like how a geometric figure is sub-divided into various kinds of cells, there are many ways of choosing a programming language. The algorithm we are looking for must produce some elements of the chosen language. We call them output. The minimal number of elements that are needed to be given to the desired algorithm is called input. Basic intuition followed here is to choose a programming language we believe an algorithm is easily expressible in, and then search for a program in it. The program is a representative of our desired algorithm.

  The ordering of instructions in a program is governed by its boundary. Say if an instruction depends on an element. And that element is not a part of the input. Then the instruction must be preceded by another instruction that produces that element. This induces a partial ordering of the instructions. That is, instructions belonging to a program need not necessarily be linearly ordered.

  Algorithms for sorting, searching, generating permutations, generating clusters, generating subsets, generating partitions, generating graphs and job scheduling, etc., list all the elements of a set of combinatorial objects. Also, algorithms themselves when seen as sets of inputs and instructions are combinatorial objects. Now let us assume a programming language is chosen. It means we choose what type of instructions will be allowed in our algorithm. For example, the language which is used to list the genetic distance between two DNA sequences has only instructions which either swap two terms or delete a term from either sequence. A program then exists that produces any element of the entire chosen programming language. Let’s call this program mother program. It is evident that other programs requiring only partial lists of the chosen programming language are derivatives of mother program. The inputs necessary for derivative programs are a subset of the inputs necessary for the mother program.

  The programming language is a finite set. The mother program can produce all elements of the set except those provided as input in the first place. Any derivative program that is needed to produce a partial list of elements of the set, is obtained by leaving some elements off, until all the instructions which depend on them have been performed.

  We characterise an algorithm as an isomorphism class of a finite set of programs. The isomorphism follows a predefined axiomatic structure. We say two programs are isomorphic when there is an axiomatic structure preserving one-to-one correspondence between them.

  The programming language (viewed as data) has its input and output kinds are provided via finite sets. A hierarchy of levels for data can be set up. The first set \(C_{1}\) is a set of inputs and outputs. The second (higher level) set \(C_{2}\) is set of instructions. That is, \(C_{2}\) lists the ways in which some elements of \(C_{1}\) are derived from others. Similarly, \(C_{3}\) is a set which lists ways in which some elements of \(C_{2}\) are derived from others. Thus, the programming language involves a sequence of hierarchy: \(C_{1},C_{2},C_{3},...,C_{n},C_{n+1}\). Note that these levels are arbitrary and relative:

$$ C_{1} = \{inputs, outputs\} $$
$$ C_{2} = \{instructions\} $$
  To draw an analogy to our hierarchical setup, consider a game of chess. At level 1, for example, we have the set of white and black squares and pieces. Players in this level make tactical moves requiring decisions about a particular piece and a particular square. At a higher level 2, we have players operating on sets of squares (diagonally, centrally, etc.), where sets of pieces are positionally played to affect sets of squares. At the third level, players operate on a set of sets of sets of squares (like the Queen-side squares) and pieces. This accounts for a higher order tactical play. A sense of incidence among these sets can be observed here. Each lower order set is a derivative of its immediate higher order set. And each hierarchical set is representative of a convex polytope, with each set of vertices called as simplexes. A set on level n has dimension n. In a comparable sense, the dimension of an instruction is higher than the dimension of the data on which it operates. For example, the dimension of a high-level language is 4. Assembly language has dimension 3; machine code has dimension 2; and the integer values have dimension 1.

  To offer a geometrical view, consider sets of integers (additive group), covered by set of instructions of first order, which in turn are covered by set of instructions of second order, and so on. A polygon formed from a set of instructions with non-overlapping edges in a plane such that each vertex is shared between two instructions. And a polyhedron is formed from a set of polygons with its faces in 3D space such that each set of instructions is shared between two faces. Repeating this construction (gluing) in sufficiently high-dimensional space, we get various higher dimensional polytopes.

  A zero-dimensional polytope is a null set; in 1 dimension is an integer; in 2 dimensions is an instruction, etc. An n-dimensional polytope with smallest number of vertices is an n-simplex. Since we are operating on a hierarchy of convex sets, we have the property that for any two elements in the set, their connection also is a member of that set. It then follows that in a set of simplices, every face of any simplex in the set is also present in that set; and the intersection of any two simplices in the set is a face of both simplices. This set is called a simplicial complex. These simplices, if glued, are joined exactly at vertices, edges, triangular faces, and so on, without intersecting each other. Since each set is a derivative of its higher order set, we have a continuous function with a continuous inverse that exists between these two sets. We then say these two are homomorphic. Every positive-dimensional polytope can be represented as a simplicial complex, without introducing any new set elements.

  The ordering of vertices \(\{v_{0},v_{1},...,v_{q}\}\) of a simplex defines an oriented q-simplex which we will denote and we say that a simplicial complex is oriented if all its simplices in are oriented. Recalling the definition of chain complex from earlier, let \(C_{q}\) (for each \( q \ge 0 \)) be the vector space whose bases is the set of all q-simplices of the oriented simplicial complex, and the elements are the linear combinations of basis vectors, called chains. Accordingly, \(C_{q}\) is called a chain group. The dimension of \(C_{q}\) is equal to the number of q-dimensional simplices of the simplicial complex. For q larger than the dimension of the simplicial complex, vector space \(C_{q}\) is trivial and equals to 0. For a set of vector spaces \(C_{q}\) the linear transformation \(\delta_{q}:C_{q} \longrightarrow C_{q-1}\) called boundary operator acts on the basis vectors \(\{v_{0},v_{1},...,v_{q}\}\) in the standard way. Taking a sequence of chain groups \(C_{q}\) connected through the boundary operators \(\delta_{q}\) we obtain a chain complex. The rank of the \(q^{th}\) homology group is called \(q^{th}\) Betti number and is equal to q-dimensional holes in simplicial complex. The value of \(\beta_{0}\)is the number of connected components of simplicial complex, \(\beta_{1}\) is the number of tunnels, \(\beta_{2}\) is the number of voids, etc. Each boundary operator \(\delta_{q}\) has its matrix representation with respect to bases of vector spaces \(C_{q}\) and \(C_{q-1}\), with rows associated with the number of (q − 1)-simplices and the columns associated with the number of q-simplices.

  We now have the programming language C:

$$ C = C_{1} \ \cup\ C_{2} \ \cup\ C_{3} \ \cup\ ... \ \cup\ C_{i} \ \cup\ ... \ \cup\ C_{q}$$
  generating a chain complex:

$$ 0\ \overset{\delta}{\longrightarrow} C_{q} \ \overset{\delta}{\longrightarrow} ... \ \overset{\delta}{\longrightarrow} C_{2} \ \overset{\delta}{\longrightarrow} C_{1} \ \overset{\delta}{\longrightarrow}0 $$
  Which has many mother programs. Each mother program that generates all elements of the programming language corresponds to a canonical basis of the chain complex. Note that since we are operating on an integral domain (additive groups), we have chain complex same as the simplicial complex discussed earlier. Any two mother programs that generate the elements of language embedded in this chain complex on isomorphic. Intuitively, we can find the input set of the mother program by calculating a basis of the homology group of the chain complex.

  Consider for example the pair of sets \(C_{1}\) and \(C_{2}\) that suggests a program. Note that this program is a part of the mother program. \(C_{1}\) is a set consisting of the first-order elements of the program. That is, \(C_{1}\) contains input elements and the elements obtained by applying instructions over input elements (result elements). \(C_{2}\) is a set consisting of the second-order elements. That is, \(C_{2}\) contains instructions that operate on elements of \(C_{1}\), elements produced by third-order elements of \(C_{3}\) and elements input to third-order elements of \(C_{3}\). This hierarchal setup continues up to \(q^{th}\) ordered set \(C_{q}\).

  Any non-expanding formulation of this (co)chain would yield us both input and instructions by computing a basis. This basis however, is algebraic and we need to transform into a computational form. Few interesting off-shoots to peruse for later: the topology of algorithms, the geometry of machine code, and composability & equivalence of two algorithms.