My Algorithm Summary
  • Introduction
  • Data Structure
    • Linked List
    • Stack
      • Monotone Stack
        • 42 Trapping Rain Water
        • 84 Largest Rectangle in Histogram
        • 85 Maximal Rectangle
        • 255 Verify Preorder Sequence in Binary Search Tree
        • 316 Remove Duplicate Characters
        • 402 Remove K Digits
        • 456 132 Pattern
        • 496 Next Greater Element I
        • 503 Next Greater Element II
      • 20 Valid Parentheses
      • 71 Simplify Path
      • 150 Evaluate Reverse Polish Notation
      • 155 Min Stack
      • 173 Binary Search Tree Iterator
      • 224 Basic Calculator
      • 227 Basic Calculator II
      • 232 Implement Queue using Stacks
      • 341 Flatten Nested List Iterator
      • 394 Decode String
      • 439 Ternary Expression Parser
      • 636 Exclusive Time of Functions
    • Heap
    • Trie
    • Segment Tree
    • Tree
      • 94 Binary Tree Inorder Traversal
      • 104 Maximum Depth of Binary Tree
      • 144 Binary Tree Preorder Traversal
      • 145 Binary Tree Postorder Traversal
      • 199 Binary Tree Right Side View
      • 226 Invert Binary Tree
      • 272 Closest Binary Search Tree Value II
      • 508 Most Frequent Subtree Sum
      • 513 Find Bottom Left Tree Value
      • 515 Find Largest Value in Each Tree Row
      • 617 Merge Two Binary Trees
      • 637 Average of Levels in Binary Tree
      • 653 Two Sum IV - Input is a BST
      • 654 Maximum Binary Tree
      • 669 Trim a Binary Search Tree
      • 666 Path Sum IV
      • 230 Kth Smallest Element in a BST
      • 250 Count Univalue Subtrees
      • 538 Convert BST to Greater Tree
      • 404 Sum of Left Leaves
      • 582 Kill Process
      • 112 Path Sum
      • 108 Convert Sorted Array to Binary Search Tree
      • 111 Minimum Depth of Binary Tree
      • 501 Find Mode in Binary Search Tree
      • 102 Binary Tree Level Order Traversal
      • 107 Binary Tree Level Order Traversal II
      • 103 Binary Tree Zigzag Level Order Traversal
      • 113 Path Sum II
      • 437 Path Sum III
      • 99 Recover Binary Search Tree
      • 687 Longest Univalue Path
      • 285 Inorder Successor in BST
      • 101 Symmetric Tree
      • 129 Sum Root to Leaf Numbers
      • 298 Binary Tree Longest Consecutive Sequence
      • 270 Closest Binary Search Tree Value
      • 549 Binary Tree Longest Consecutive Sequence II
      • 98 Validate Binary Search Tree
      • 652 Find Duplicate Subtrees
      • 314 Binary Tree Vertical Order Traversal
      • 333 Largest BST Subtree
      • 563 Binary Tree Tilt
      • 110 Balanced Binary Tree
    • Graph
      • Detect Cycle
  • Algorithms
    • Union Find
      • 695 Max Area of Island
      • 684 Redundant Connection
    • Binary Search
    • Topological Sorting
    • Breadth-First Search
      • 694 Number of Distinct Islands
    • Depth-First Search
    • Two Pointers
    • Sorting
    • Backtacking
    • Dynamic Programming
      • Interval DP
        • Matrix Chain Multiplication
        • Merge Stone
      • KnapSack Problem
        • 0-1 KnapSack
        • Unbounded KnapSack
      • Longest Increasing Subsequence
      • Longest Common Subsequence
    • Reservior Sampling
    • Bipartite Graph
      • Check Bipartite Graph
      • Maximal Matching - Hungarian Algorithm
    • String Pattern Matching
      • KMP Algorithm
      • Rabin Karp Algorithm
  • System Design
    • Consistent Hashing
    • Bloom Filter
    • Caching
      • LRU
      • LFU
    • Mini Twitter
    • Tiny Url
Powered by GitBook
On this page
  • Reservior Sampling
  • 1.Idea
  • 2. Implementation
  • 3. Proof (Mathematical Induction)
  • 4. Time & Space Complexity

Was this helpful?

  1. Algorithms

Reservior Sampling

Reservior Sampling

1.Idea

蓄水池算法是一种Randomized Algorithm, 对于n个数(n 可以为无限大),它能保证每个数都以相等的概率被选到。它的基本思想是,维护一个size为k (k < n)的蓄水池,依次遍历所有数据的时候以相等的概率替换这个蓄水池中的数字

2. Implementation

public static List<Integer> reserviorSampling(int[] nums, int k) {
        List<Integer> res = new ArrayList<>();

        if (nums.length <= k) {
            return res;
        }

        for (int i = 0; i < k; i++) {
            res.add(nums[i]);
        }

        Random rand = new Random();

        for (int i = k + 1; i < nums.length; i++) {
            int j = rand.nextInt(i);

            // If we use j == k here, that means we want an item to be selected with probability 1/i
            if (j < k) {
                res.set(j, nums[i]);
            }
        }
        return res;
    }

3. Proof (Mathematical Induction)

证明: 对于n(n > k)个数,每个数被选到的概率是 k/n

  1. 假设当 i=k 的时候结论成立,此时以 k/i 的概率来选择第i个元素,前i-1个元素出现在蓄水池的概率都为k/i。

  2. 当i = k+1时,蓄水池的容量为k,第k+1个元素被选择的概率明显为k/(k+1)

  3. 当i = k +1时, 需要证明当以 k/i+1 的概率来选择第i+1个元素的时候,此时任一前i个元素出现在蓄水池的概率都为k/(i+1).

    前i个元素出现在蓄水池的概率有2部分组成:

    ①在第i+1次选择前得出现在蓄水池中

    ②得保证第i+1次选择的时候不被替换掉

    由1知道在第i+1次选择前,任一前i个元素出现在蓄水池的概率都为k/i

  4. 对于前i个数被替换的概率:

    首先要被替换得第 i+1 个元素被选中 (不然不用替换了)概率为 k/i+1

    其次是因为随机替换的池子中k个元素中任意一个,所以不幸被替换的概率是 1/k,故

  5. 前i个元素(池中元素)中任一被替换的概率 = k/(i+1) * 1/k = 1/i+1, 则(池中元素中)没有被替换的概率为: 1 - 1/(i+1) = i/i+1

    得到前i个元素出现在蓄水池的概率为 k/i * i/(i+1) = k/i+1

4. Time & Space Complexity

Time: O(n)

Space: O(1)

PreviousLongest Common SubsequenceNextBipartite Graph

Last updated 5 years ago

Was this helpful?