Mittwoch, 6. April 2016

Is my problem NP-hard for realz?

On stack exchange it was claimed that my problem would be NP-hard, but i don't believe it yet :-D.

As a reference i was given this paper A note on the computational complexity of a string orbit problem where similar problems are proven to be NP-hard, which are however not the same.

While my problem is a special case of the Constructive String Orbit problem (CSO) (problem 4.1 in the paper) The proof for the NP-hardness of CSO uses a different permutation group and a different ordering of the matrix indices than mine.

In the paper CSO is proven NP hard by reducing Max-Clique on it. But i think that doesn't work for mine.

Lemmy explain:

In a graph with undirected edges a k-Clique are k nodes so that every pair of those nodes is connected.

Example:



That graph has for example a 5-clique, namely the 5 vertices  3,5,7,9,10 , which are all connected with each other.


The problem of finding the largest clique in a graph is NP-hard and so is the decision problem Max-Clique:
Given a graph G and a number k, determine if there is a k-clique in the graph.

I.e. there is no polynomial time algorithm that solve this problem, unless there are polynomial time algorithms for all problems in NP, i.e. unless P=NP.

The proof of Lemma 4.2 in the paper uses the NP-completeness of Max-Clique and in addition it uses a correspondence between certain patterns in boolean matrices and k-cliques.
Namely, a k-clique in a graph corresponds to a submatrix A(I,I) of the adjacency matrix A  of the graph, so that |I|=k and A(I,I) consists only of ones, if we set all diagonal elements to one.




So we can fomulate the Max-Clique problem also as a boolean matrix problem which is thereby also NP-complete:
Given a symmetric matrix A with ones on the diagonal and a number k, determine if there is \(I\subseteq\{1,2,...,n\}\) with \(|I|=k\) so that the submatrix A(I,I) consists only of ones.

 Now the paper uses a specific lexical order on boolean matrices, which is however not the same as mine.

Namely first define a linear order on the index set \(I=\{ (i,j) : 1\leq i,j \leq n \}\) so that
all indices in \(I_k = \{ (i,j) \in I : i=k \vee j=k\}\) always come before the indices  \(I_{k+1}\) in the order.
I.e. \((1,1)< (1,2), (2,2), (2,1) < (1,3),(2,3),(3,3),(3,2),(3,1)<...<(1,n),(2,n),..(n,n).,(n,1)\) and the indices in between are ordererd arbitrarily.

The otder <_ind on the index set from small/light to large/dark




Now they define the lexicographic order on boolean square matrices with respect to this index order \(<_{ind}\), i.e. \(A<B\) iff for the first \(i,j\) in the order \(<_{ind}\) with \(A(i,j)\neq B(i,j)\) it holds that \(A(i,j)=0\) and \(B(i,j)=1\).

Then they reduce the max clique problem to the following problem:
Given boolean matrices \(A\) and \(C\)  is there a boolean matrix \(B\) with \(C<B\) and \(B=P^t A P\) where \(P\) is a permutation matrix.

I.e. can we permute the rows and columns of A so that we get a lexicographically bigger matrix than C? (Whereby however rows and columns can't be permuted independently from each other as in my problem, but instead the same permutation is applied to rows and columns.)

Now Max-Clique can be reduced to that problem as follows:
Given a graph G and a number k,
let \(A\) be the adjacency matrix of G with ones on the diagonal, and
\(C\) the matrix that has \(k\times k\) square of ones in the upper left corner and otherwise only zeros.

Then there is a k-Clique in G if and only if  one can permute columns and rows of A so to get a matrix B that is lexicographically larger than C.
Because any matrix B>C must have also only ones in the upper kxk-square and there is a k-clique then there is a kxk pattern in the adjacency matrix that can be moved to the upper left square.
Matrix C with omes in upper 5x5 square

But that doesn't work for my problem. Because i have a different lexicographical order on the matrices.
In my order any matrix that had k+1 ones in a row would already be equivalent to a matrix that is bigger than the kxk upper square matrix.

And there is no way to check whether an adjacency matrix has a symmetric kxk pattern just by checking whether it is larger than a given matrix for my order.

The difference is that given my order and independent permutation of rows and columns it is not necessary to find the largest symmetric kxk pattern in order to maximize matrices. Instead it is only necessary to find a different kind of pattern, namely:
kxl - rectangles for which k is maximal and l is maximal for that k.

The problem:
Given a boolean matrix A, find (I,J), so that A(I,J) consists only of ones and there is no (X,Y) with |X|>|I| or (|X|=|I| and |Y|>|J|) so that A(X,Y) consists only of ones.

is in PTime unlike the problem of finding the maximal I so that A(I,I) is only ones, which is NP-hard.

The algorithm is simple:

Count the number of ones in each row.
Let M be the set of row indices of rows that have maximal number of ones.
while M isn't empty
  pick a i in M
  create a new bucket M(i):={i} and a counter n_i := 1
  for all j in M
    if A(i,-)=A(j,-) then put j into M(i)
    n_i := n_i+1
    remove j from M
  end for
end while
Choose an i for which n_i is maximal.
Let J be the column indices where i has ones.
Output (M(i), J)
End

What does this problem of finding a maximal rectangular submatrix of ones in a boolean matrix correspond to in graph theory?

Let us generalize the idea of a k-clique.
A k-clique is a single team of k mutual friends.

Let G=(V,E) be a graph. Then i call subsets \( I,J \subseteq V \) of nodes with \(l=|I|,k=|J|\)
(l,k)-wannabe-clique if
everyone from team I is friends with everyone from team J.
iow
all nodes in I are connected with all nodes in J.

So in a (k,l)-wannabe clique there is a core clique \(I\cap J\) of people who belong to both teams who are all mutual friends with each other and all people who are in different teams are mutual friends, but teams who are only in one team are not necessarily friends with each other.
It is a "wannabe" clique, because if everyone would belong to both teams it would be a real clique.

Here is an example:

There is (3,3)-wannabe clique in this graph.
Namely I={2,4,6}, J={7,8,9}.
Bur there is also (1,4)-wannabe clique, which would be larger in my order, becaue lexicographically (from right to left) (1,4)>(3,3)

That's the largest wannabe-clique in this graph.
So, finding the lexicographical largest wannabe clique is easy cheezy Ptime, whereas
finding the largest clique is NP-hard.
My problem only requires finding the largest wannabe-clique, whereas the problem in the paper requires implcitly requiring finding the largest clique.
Thus i have still hope that my problem could be in Ptime after all.

Edit: While i was writing this blog someone on stack exchange claimed my problem would be precisely the canonization problem for isomorphism of undirected bipartite graphs, thus NOT NECESSARILY NP-hard! Yippie ;)

Keine Kommentare:

Kommentar veröffentlichen