13 Randomized Algorithms¶

Monte Carlo Algorithms V.S. Las Vegas Algorithms

算法对比

	Monte Carlo Algorithms	Las Vegas Algorithms
结果准确性	近似解	确定解
执行时间	确定	不确定
应用场景	近似计算、物理模拟等	需精确解的问题
随机性作用	控制精度（越多越准）	优化执行效率

The worst-case running time is equal to the expected running time within constant factors for any randomized algorithm.

答案

\(\textnormal{F}\)

A Las Vegas algorithm is a randomized algorithm that always gives the correct result, however the runtime of a Las Vegas algorithm differs depending on the input.

A Monte Carlo algorithm is a randomized algorithm whose output may be incorrect with a certain (typically small) probability. The running time for the algorithm is fixed however.

Then if a Monte Carlo algorithm runs in \(O(n^2)\) time, with the probability 50% of producing a correct solution, then there must be a Las Vegas algorithm that can get a solution in \(O(n^2)\) time in expectation.

答案

\(\textnormal{F}\)

Randomized Quicksort

Let \(a = (a_1, a_2, \dots, a_i, \dots, a_j, \dots, a_n)\) denote the list of elements we want to sort. In the quicksort algorithm, if the pivot is selected uniformly at random. Then any two elements get compared at most once and the probability of \(a_i\) and \(a_j\) being compared is \(2/(j - i + 1)\) for \(j > i\), given that \(a_i\) or \(a_j\) is selected as the pivot.

答案

\(\textnormal{F}\)

Reviewing the randomized QuickSort in our course, we always select a central splitter as a pivot before recursions, make sure that each side contains at least \(n/4\) elements. Hence, differing from the deterministic QuickSort, the worst case expected running time of the randomized QuickSort is \(\Theta(N\log N)\).

答案

\(\textnormal{T}\)

A randomized Quicksort algorithm has an \(O(N \log N)\) expected running time, only if all the input permutations are equally likely.

答案

\(\textnormal{F}\)

Consider the randomized quicksort. We have proved that it runs in \(O(n \log n)\) time in expectation even for the worst input. Is the following statement true of false?

There exists some good inputs on which the expected running time of randomized quicksort is \(O(n)\) where \(n\) is the input size.

答案

\(\textnormal{F}\)

To find the \(k\)th smallest number in a set \(S\), randomized quick selection algorithm works in the following way:

If \(| S | = 1\), then \(k = 1\) and return the only element in \(S\).
Randomly select a central splitter \(p \in S\), which is a pivot that divides the set so that each side contains at least \(| S | /4\) elements.
Partition \(S - \{p\}\) into \(S_1\) and \(S_2\), as was done with quicksort.
If \(k \leq | S_1 |\), recursively find the \(k\)th smallest number in \(S_1\). If \(k = | S_1 | + 1\), return the pivot as the answer. Otherwise, recursively find the \((k - | S_1 | - 1)\)st smallest number in \(S_2\).

If \(| S | = n\), then the best upper bound of the expected time complexity of this algorithm is ?

答案

不同于快速排序要求两边都递归，快速选择只用递归包含目标元素的一边，每轮处理的数组越来越小，则有：

\[ \begin{align*} T(n) &\leq T\left(\frac{3n}{4}\right) + O(n) = O(n) + O\left(\frac{3n}{4}\right) + O\left(\left(\frac{3}{4}\right)^2 n\right) + \cdots \\ &= n \cdot \left(1 + \frac{3}{4} + \left(\frac{3}{4}\right)^2 + \dots\right) = n \cdot \frac{1}{1 - \frac{3}{4}} = 4n = O(n) \end{align*} \]

随机算法复杂度的递推

Given a linked list containing \(N\) nodes. Our task is to remove all the nodes. At each step, we randomly choose one node in the current list, then delete the selected node together with all the nodes after it. Here we assume that each time we choose one node uniformly among all the remaining nodes. What is the expected number of steps to remove all the nodes?

答案

令 \(T(n)\) 表示包含 \(n\) 个节点的链表完成删除所有节点的期望步骤数，则：

\[ T(n) = 1 + \frac{1}{n} \left( T(1)+T(2)+...+T(n-1) \right) \]

通过递推式计算小值 \(n\) 对应的 \(T(n)\)，有 \(T(1) = 1\)，\(T(2) = \frac{3}{2}\)，\(T(3) = \frac{11}{6}\)，\(T(4) = \frac{25}{12}\)，观察发现 \(T(n)\) 恰好是第 \(n\) 个调和数，即：

\[ T(n) = H_n = 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n} \]

而 \(H_n = O(\log n)\)，故 \(n\) 个节点时的期望步骤数是 \(O(\log N)\)

Online Hiring Algorithm

The Online Hiring Algorithm (hire only once) is described as the following:

int OnlineHiring ( EventType C [ ], int N, int k )
{
    int Best = N;
    int BestQ = -INFINITY ;
    for ( i=1; i<=k; i++ ) {
        Qi = interview( i );
        if ( Qi > BestQ ) BestQ = Qi;
    }
    for ( i=k+1; i<=N; i++ ) {
        Qi = interview( i );
        if ( Qi > BestQ ) {
            Best = i;
            break;
        }
    }
    return Best;
}

Assume that the quality input \(C[\,]\) is uniformly random. When \(N = 271\) and \(k = 90\), the probability of hiring the \(N\)-th candidate is?

答案

雇佣第 \(N\) 个候选人的充要条件是前 \(N-1\) 个候选人的最高质量出现在前 \(k\) 个位置，故 \(P = \frac{90}{270} = \frac{1}{3}\)

Consider the online hiring problem, in which we have total \(m\) candidates. First of all, we interview \(n\) candidates but reject them all. Then we hire the first candidate who is better than all of the previous candidates you have interviewed. It is true that the probability of the \(k\)th candidate is the best is \(\frac{n}{k(m-1)}\), where \(k > n\).

答案

第 \(k\) 个候选人恰好是全局最优者的概率是 \(\frac{1}{m}\)，第 \(k\) 个候选人是全局最优者且被雇佣的概率是 \(\frac{n}{m(k-1)}\)，答案为 \(\textnormal{F}\)

Consider the hiring problem. Assume that the \(n\) candidates arrive in random order, and that no two candidates have the same performance. Let \(k\) be some fixed parameter in the range \([1, n]\). We use the following algorithm.

interview the first k candidates, but hire none of them
for candidates i = k+1 to n
    if candidate i is better than all the first k candidates
        hire candidate i

How many candidtes will be hired in expectation?

答案

令 \(X_i = \begin{cases} 1 & \text{if candidate} \,\, i \,\, \text{is hired} \\ 0 & \text{if candidate} \,\, i \,\, \text{is NOT hired} \end{cases}\)

则 \(H = X_{k+1} + X_{k+2} + \dots + X_n\)，\(E[H] = E\left[X_{k+1} + X_{k+2} + \dots + X_n\right] = \sum_{i=k+1}^n E[X_i]\)

第 \(i\) 名候选人被雇佣的条件是比前 \(k\) 名候选人都优秀，故 \(E[X_i] = P(X_i=1) = \frac{1}{k+1}\)

\(\therefore E[H] = \sum\limits_{i=k+1}^n \frac{1}{k+1} = \frac{n-k}{k+1}\)

Consider the hiring problem. Assume that the \(n\) candidates arrive in random order, and that no two candidates have the same performance. Let \(k\) be some even number (偶数) in the range \([1,n]\). We use the following algorithm.

interview the first k candidates, but hire none of them
for candidates i = k+1 to n
    if candidate i is better than at least half of the first k candidates
        hire candidate i

It is easy to see that the above algorithm may hire more than one candidates. How many candidates will be hired in expectation?

答案

令 \(k=2m\)，则条件等价于比前 \(k\) 人中 \(\ge m\) 人优秀，令第 \(i\) 人在这 \(2m+1\) 人中的排名为 \(r\)，则满足雇佣条件的 \(r\) 取值为 \(1,2,\dots,m+1\)，\(\Pr(X_i=1) = \frac{m+1}{2m+1} = \frac{k+2}{2(k+1)}\)，\(E[X] = \sum\limits_{i=k+1}^n \Pr(X_i=1) = \frac{(n-k)(k+2)}{2(k+1)}\)

Max-Cut

Let us consider the problem of finding a large cut in an undirected graph \(G=(V, E)\) which has \(n\) vertices and \(m\) edges. A cut is a partition of the vertices into two disjoint sets, and the value of a cut is the weight of all edges crossing from one side of the partition to the other. Here we consider the case where all edges in the graph have the same weight 1.

Suppose that a partition of \(V\) into two disjoint sets \(A\) and \(B\) is given by a randomized algorithm, in which each vertex is randomly and independently assigned to one of the two sets.Which of the following statements is FALSE?

\(\textnormal{A}\). The expected approximation ratio of the algorithm is at most \(2\).

\(\textnormal{B}\). There exists a partition \(A\) and \(B\) with at least \(\frac{m}{2}\) edges connecting the set \(A\) to the set \(B\).

\(\textnormal{C}\). The probability of finding a cut with value at least \(\frac{m}{2}\) is less than \(\frac{2}{m+2}\).

\(\textnormal{D}\). The expected number of edges in the partition generated by the algorithm is \(\frac{m}{2}\).

答案

割的期望边数 \(E[X] = \sum\limits_{e \in E} E[X_e] = m \times \frac{1}{2} = \frac{m}{2}\)，且至少存在一个划分的割值 \(≥\) 期望，故 \(\textnormal{B}\) 和 \(\textnormal{D}\) 表述正确

由 \(\text{OPT} \leq m\) 得 \(E[X] = \frac{m}{2} \geq \frac{\text{OPT}}{2}\)，则 \(\rho \leq 2\)，故 \(\textnormal{A}\) 表述正确

若 \(P(X \geq \frac{m}{2}) < \frac{2}{m+2}\)，则 \(P(X < \frac{m}{2}) \geq 1 - \frac{2}{m+2} = \frac{m}{m+2}\)，此时有：

\[ \begin{align*} E[X] &= E\left[X \mid X < \frac{m}{2}\right] \cdot P\left(X < \frac{m}{2}\right) + E\left[X \mid X \geq \frac{m}{2}\right] \cdot P\left(X \geq \frac{m}{2}\right) \\ &\leq \left( \frac{m}{2} - 1 \right) \cdot \frac{m}{m+2} + m \cdot \frac{2}{m+2} = \frac{m}{2} \end{align*} \]

与 \(E[X] = \frac{m}{2}\) 矛盾，故 \(\textnormal{C}\) 表述错误

综上所述，答案为 \(\textnormal{C}\)

\(3-\)SAT

Given a \(3-\)SAT formula with \(k\) clauses, in which each clause has three variables, the MAX\(-3\)SAT problem is to find a truth assignment that satisfies as many clauses as possible. A simple randomized algorithm is to flip a coin, and to set each variable true with probability \(\frac{1}{2}\), independently for each variable. Which of the following statements is FALSE?

\(\textnormal{A}\). The expected number of clauses satisfied by this random assignment is \(\frac{7k}{8}\).

\(\textnormal{B}\). The probability that a random assignment satisfies at least \(\frac{7k}{8}\) clauses is at most \(\frac{1}{8k}\).

\(\textnormal{C}\). For every instance of \(3-\)SAT, there is a truth assignment that satisfies at least a \(\frac{7}{8}\) fraction of all clauses.

\(\textnormal{D}\). If we repeatedly generate random truth assignments until one of them satisfies \(\ge \frac{7k}{8}\) clauses, then this algorithm is a \(\frac{8}{7}-\)approximation algorithm.

答案

\(E[X_i] = 1- \left( \frac{1}{2} \right)^3 = \frac{7}{8}\)，\(E[X] = \sum\limits_{i=1}^k E[X_i] = \frac{7k}{8}\)，故 \(\textnormal{A}\) 表述正确

当 \(k=1\) 时，\(E[X] = \frac{7}{8} > \frac{1}{8}\)，故 \(\textnormal{B}\) 表述错误

若所有赋值满足的子句数都小于 \(\frac{7k}{8}\)，则平均会小于 \(\frac{7k}{8}\)，与 \(E[X] = \frac{7k}{8}\) 矛盾，因此必然存在至少一个赋值满足的子句数 \(\geq\) 期望即 \(\geq 7k/8\)，故 \(\textnormal{C}\) 表述正确

算法输出的 \(X \geq 7k/8\)，而最优解 \(\text{OPT} \leq k\)（最多满足所有 \(k\) 个子句），则有 \(X \geq \frac{7k}{8} \geq \frac{7}{8} \times \text{OPT}\)，即 \(\text{OPT} \leq \frac{8}{7}X\)，对应 \(\rho = \frac{8}{7}\)，故 \(\textnormal{D}\) 表述正确

综上所述，答案为 \(\textnormal{B}\)

MAX SAT

In the maximum satisfiability problem (MAX SAT), the input consists of \(n\) Boolean variables \(x_1,\ldots,x_n\), \(m\) clauses \(C_1,\ldots,C_m\)(each of which consists of a disjunction(that is an “or”) of some number of the variables and their negations, e.g. \(x_3 \vee \bar{x}_5 \vee x_{11}\), where \(\bar{x}_i\) is the negation of \(x_i\)), and a nonnegative weight \(w_j\) for each clause \(C_j\). The objective of the problem is to find an assignment of the true/false to the \(x_i\) that maximizes the weight of the satisfied clauses.

A variable or a negated variable is a literal. The number of literals in a clause is called its length. Denote \(l_j\) to be the length of a clause \(C_j\) . Clauses of length 1 are called unit clauses.

Randomized algorithm RA: Setting each \(x_i\) to true with probability \(p\) independently.

Which of the following statement is false?

\(\textnormal{A}\). Let \(p=1/2\), the randomized algorithm RA is a \(2-\)approximation algorithm

\(\textnormal{B}\). If \(l_j \geq 3\) for each clause \(C_j\). Let \(p=1/2\), the randomized algorithm RA is a 9/8-approximation algorithm

\(\textnormal{C}\). If MAX SAT instances do not have unit clauses \(\bar{x}_i\), we can obtain a randomized \(\frac{2}{\sqrt{5}-1} \approx 1.618-\)approximation algorithm for MAX SAT

\(\textnormal{D}\). One could obtain a better bound on optimal solution than \(\sum\limits_{j=1}^m w_j\) for MAX SAT

答案

\(E[ALG] = \sum\limits_{j=1}^m w_j \cdot [1 - (\frac{1}{2})^{l_j}] \geq \sum\limits_{j=1}^m w_j \cdot \frac{1}{2}\)，由 \(OPT ≤ \sum w_j\) 且 \(OPT = \sum w_j \cdot s_j\)（\(s_j∈\{0,1\}\) 为 \(C_j\) 是否被 OPT 满足）得 \(E[ALG] ≥ OPT/2\)，故 RA 是 \(2-\)近似算法，故 \(\textnormal{A}\) 表述正确

对 \(l_j≥3\)，\(E[ALG] ≥ \sum w_j \cdot \frac{7}{8}\)，若 \(OPT≈\sum w_j\)（最优解满足所有子句），则 \(E[ALG] \approx \frac{7 \, \text{OPT}}{8}\)，近似比 \(\alpha = \frac{8}{7}\)，故 \(\textnormal{B}\) 表述错误

无单元子句 \(\bar{x}_i\) 时，由 \(p=1-p^2\) 得 \(p=\frac{\sqrt{5}-1}{2} \approx 0.618\)，近似比 \(\alpha = \frac{1}{p} \approx 1.618\)，故 \(\textnormal{C}\) 表述正确

\(\sum w_j\) 是OPT的宽松上界（所有子句满足的权重和），实际存在更紧的上界，故 \(\textnormal{D}\) 表述正确

综上所述，答案为 \(\textnormal{B}\)

随机分配

We have \(m\) balls and \(m\) boxes. Each ball is assigned to one of the \(m\) boxes independently and uniformly at random (i.e., equally likely to each box). Suppose that m is sufficiently large, and \(e\) is the natural number, which of the following is FALSE?

\(\textnormal{A}\). The expected number of balls in a box is \(1\).

\(\textnormal{B}\). The expected number of empty boxes is \(m/e\).

\(\textnormal{C}\). Suppose that one box can only contain one ball, and it will reject other balls if it already contains one. Then, the expected number of boxes containing exactly one ball is \(m/e\).

\(\textnormal{D}\). Suppose a box can accommodate two balls, and will reject any additional balls. The expected number of boxes containing exactly one ball is \(m/e\).

答案

每个球独立以概率 \(\frac{1}{m}\) 放入指定盒子，盒子中球数服从二项分布 \(\text{Binomial}(m, \frac{1}{m})\)，期望为 \(m \cdot \frac{1}{m} = 1\)，故 \(\textnormal{A}\) 表述正确

每个盒子为空概率为 \((1 - \frac{1}{m})^m\)，期望空盒子数为 \(m(1 - \frac{1}{m})^m\)，当 \(m\) 充分大时，\((1 - \frac{1}{m})^m \approx e^{-1}\)，因此期望近似为 \(\frac{m}{e}\)，故 \(\textnormal{B}\) 表述正确

盒子容量为 \(1\) 时，每个盒子最多装一个球，恰好有一个球的盒子数 \(=\) 总盒子数 \(-\) 空盒子数 \(= m - \frac{m}{e}\)，故 \(\textnormal{C}\) 表述错误

每个盒子最终恰有一个球当且仅当恰好有一个球命中该盒子，概率为 \(\binom{m}{1}(\frac{1}{m})(1 - \frac{1}{m})^{m-1} = (1 - \frac{1}{m})^{m-1}\)，期望数为 \(m(1 - \frac{1}{m})^{m-1} \approx \frac{m}{e}\)，故 \(\textnormal{D}\) 表述正确

综上，答案为 \(\textnormal{C}\)