Options
All
  • Public
  • Public/Protected
  • All
Menu

MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. It is able to estimate the Jaccard similarity between two large sets of numbers using random hashing.

WARNING: Only the MinHash produced by the same MinHashFactory can be compared between them.

see

"On the resemblance and containment of documents", by Andrei Z. Broder, in Compression and Complexity of Sequences: Proceedings, Positano, Amalfitan Coast, Salerno, Italy, June 11-13, 1997.

author

Thomas Minier

Hierarchy

Index

Properties

_hashFunctions: HashFunction[]
_hashing: Hashing
_nbHashes: number
_rng: PRNG
_seed: number
_signature: number[]

Methods

  • add(value: number): void
  • Insert a value into the MinHash and update its signature.

    Parameters

    • value: number

      Value to insert

    Returns void

  • bulkLoad(values: number[]): void
  • Ingest a set of values into the MinHash, in an efficient manner, and update its signature.

    Parameters

    • values: number[]

      Set of values to load

    Returns void

  • compareWith(other: MinHash): number
  • Estimate the Jaccard similarity coefficient with another MinHash signature

    Parameters

    • other: MinHash

      MinHash to compare with

    Returns number

    The estimated Jaccard similarity coefficient between the two sets

  • fromJSON(json: JSON): any
  • Load an Object from a provided JSON object

    Parameters

    • json: JSON

      the JSON object to load

    Returns any

    Return the Object loaded from the provided JSON object

  • isEmpty(): boolean
  • Test if the signature of the MinHash is empty

    Returns boolean

    True if the MinHash is empty, False otherwise

  • nextInt32(): number
  • saveAsJSON(): any

Constructors

Accessors

  • get nbHashes(): number
  • Get a function used to draw random number

    Returns PRNG

    A factory function used to draw random integer

  • get seed(): number
  • set seed(seed: number): void
  • Get the seed used in this structure

    Returns number

  • Set the seed for this structure

    Parameters

    • seed: number

      the new seed that will be used in this structure

    Returns void

Generated using TypeDoc