API documentation

HDTDocument

class hdt.HDTDocument

An HDTDocument enables to load and query a HDT file.

Constructor:
  • file str: Path to the HDT file to load.

  • predicate boolean: True if additional indexes must be loaded, False otherwise.

__init__(self, filePath) → hdt.HDTDocument

Build a new hdt.HDTDocument by the loading the HDT file located in filePath.

Args:
  • filePath str: the path to the HDT file to load.

from hdt import HDTDocument

# Load HDT file. Missing indexes are generated automatically
document = HDTDocument("test.hdt")

# Display some metadata about the HDT document itself
print("nb triples: %i" % document.total_triples)
print("nb subjects: %i" % document.nb_subjects)
print("nb predicates: %i" % document.nb_predicates)
print("nb objects: %i" % document.nb_objets)
print("nb shared subject-object: %i" % document.nb_shared)
convert_id(self: hdt.HDTDocument, id: int, position: hdt.IdentifierPosition) → str

Transform an Object Identifier to a RDF term. Such identifier are used in TripleID.

Args:
Return:

The RDF term associated with the Object Identifier, i.e., either an URI or a RDF literal.

from hdt import HDTDocument, IdentifierPosition
document = HDTDocument("test.hdt")
print(document.convert_id(10, IdentifierPosition.Subject))
convert_id_bytes(self: hdt.HDTDocument, id: int, position: hdt.IdentifierPosition) → bytes

Transform an Object Identifier to a RDF term. Such identifier are used in TripleID.

Args:
Return:

The RDF term associated with the Object Identifier, i.e., either an URI or a RDF literal.

from hdt import HDTDocument, IdentifierPosition
document = HDTDocument("test.hdt")
print(document.convert_id(10, IdentifierPosition.Subject))
convert_term(self: hdt.HDTDocument, term: str, position: hdt.IdentifierPosition) → int

Transform an RDF Term to the associated Object Identifier. Such identifier are used in TripleID.

Args:
Return:

The Object Identifier associated with the RDF Term

from hdt import HDTDocument, IdentifierPosition
document = HDTDocument("test.hdt")
print(document.convert_term("http://example.org#Alice", IdentifierPosition.Subject))
convert_tripleid(self: hdt.HDTDocument, subject: int, predicate: int, object: int) → Tuple[str, str, str]

Transform a RDF triple from a TripleID representation to a string representation.

Args:
  • subject int: unique ID of the subject.

  • predicate int: unique ID of the predicate.

  • obj int: unique ID of the object.

Return:

A triple in string representation, i.e., a 3-elements tuple (subject, predicate, object)

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# Fetch all triples that matches { ?s foaf:name ?o }
pred = document.convert_term("http://xmlns.com/foaf/0.1/")
(triples, cardinality) = document.search_triples_ids(0, pred, 0)

for s, p, o in triples:
  print(s, p, o) # will print Object identifiers, i.e., integers
  # convert a triple ID to a string format
  print(document.convert_tripleid(s, p, o))
convert_tripleid_bytes(self: hdt.HDTDocument, subject: int, predicate: int, object: int) → Tuple[bytes, bytes, bytes]

Transform a RDF triple from a TripleID representation to a string representation.

Args:
  • subject int: unique ID of the subject.

  • predicate int: unique ID of the predicate.

  • obj int: unique ID of the object.

Return:

A triple in string representation, i.e., a 3-elements tuple (subject, predicate, object)

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# Fetch all triples that matches { ?s foaf:name ?o }
pred = document.convert_term("http://xmlns.com/foaf/0.1/")
(triples, cardinality) = document.search_triples_ids(0, pred, 0)

for s, p, o in triples:
  print(s, p, o) # will print Object identifiers, i.e., integers
  # convert a triple ID to a string format
  print(document.convert_tripleid(s, p, o))
property file_path

Return the path to the HDT file currently loaded

property nb_objects

Return the number of objects in the HDT document

property nb_predicates

Return the number of predicates in the HDT document

property nb_shared

Return the number of shared subject-object in the HDT document

property nb_subjects

Return the number of subjects in the HDT document

search_join(self: hdt.HDTDocument, patterns: List[Tuple[str, str, str]]) → hdt.JoinIterator

Evaluate a join between a set of triple patterns using an iterator. A triple pattern itself is a 3-elements tuple (subject, predicate, object), where SPARQL variables, i.e., join predicates, are prefixed by a ?.

Args:
  • patterns set: set of triple patterns.

Return:

A hdt.JoinIterator, which can be consumed as a Python iterator to evaluates the join.

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# find all actors with their names in the HDT document
tp_a = ("?s", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://example.org#Actor")
tp_b = ("?s", "http://xmlns.com/foaf/0.1/name", "?name")
iterator = document.search_join(set([tp_a, tp_b]))

print("estimated join cardinality : %i" % len(iterator))
for mappings in iterator:
  print(mappings)
search_join_bytes(self: hdt.HDTDocument, patterns: List[Tuple[str, str, str]]) → hdt.JoinIteratorBytes

Evaluate a join between a set of triple patterns using an iterator. A triple pattern itself is a 3-elements tuple (subject, predicate, object), where SPARQL variables, i.e., join predicates, are prefixed by a ?.

Args:
  • patterns set: set of triple patterns.

Return:

A hdt.JoinIterator, which can be consumed as a Python iterator to evaluates the join.

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# find all actors with their names in the HDT document
tp_a = ("?s", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://example.org#Actor")
tp_b = ("?s", "http://xmlns.com/foaf/0.1/name", "?name")
iterator = document.search_join(set([tp_a, tp_b]))

print("estimated join cardinality : %i" % len(iterator))
for mappings in iterator:
  print(mappings)
search_triples(self: hdt.HDTDocument, subject: str, predicate: str, object: str, limit: int=0, offset: int=0) → Tuple[hdt.TripleIterator, int]

Search for RDF triples matching the triple pattern { subject predicate object }, with an optional limit and offset. Use empty strings ("") to indicate wildcards.

Args:
  • subject str: The subject of the triple pattern to seach for.

  • predicate str: The predicate of the triple pattern to seach for.

  • obj str: The object of the triple pattern ot seach for.

  • limit int optional: Maximum number of triples to search for.

  • offset int optional: Number of matching triples to skip before returning results.

Return:

A 2-elements tuple (hdt.TripleIterator, estimated pattern cardinality), where the TripleIterator iterates over matching RDF triples.

A RDF triple itself is a 3-elements tuple (subject, predicate, object).

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# Fetch all triples that matches { ?s ?p ?o }
(triples, cardinality) = document.search_triples("", "", "")

print("cardinality of { ?s ?p ?o }: %i" % cardinality)
for triple in triples:
  print(triple)
search_triples_bytes(self: hdt.HDTDocument, subject: str, predicate: str, object: str, limit: int=0, offset: int=0) → Tuple[hdt.TripleIteratorBytes, int]

Search for RDF triples matching the triple pattern { subject predicate object }, with an optional limit and offset. Use empty strings ("") to indicate wildcards.

Args:
  • subject str: The subject of the triple pattern to seach for.

  • predicate str: The predicate of the triple pattern to seach for.

  • obj str: The object of the triple pattern ot seach for.

  • limit int optional: Maximum number of triples to search for.

  • offset int optional: Number of matching triples to skip before returning results.

Return:

A 2-elements tuple (hdt.TripleIterator, estimated pattern cardinality), where the TripleIterator iterates over matching RDF triples.

A RDF triple itself is a 3-elements tuple (subject, predicate, object).

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# Fetch all triples that matches { ?s ?p ?o }
(triples, cardinality) = document.search_triples("", "", "")

print("cardinality of { ?s ?p ?o }: %i" % cardinality)
for triple in triples:
  print(triple)
search_triples_ids(self: hdt.HDTDocument, subject: int, predicate: int, object: int, limit: int=0, offset: int=0) → Tuple[hdt.TripleIDIterator, int]

Same as hdt.HDTDocument.search_triples(), but RDF triples are represented as unique ids (from the HDT Dictionnary). Use the integer 0 to indicate wildcards.

Mapping between ids and RDF terms is done using hdt.HDTDocument.convert_id(), hdt.HDTDocument.convert_term() and hdt.HDTDocument.convert_tripleid().

Args:
  • subject int: The Object identifier of the triple pattern’s subject.

  • predicate int: The Object identifier of the triple pattern’s predicate.

  • obj int: The Object identifier of the triple pattern’s object.

  • limit int optional: Maximum number of triples to search for.

  • offset int optional: Number of matching triples to skip before returning results.

Return:

A 2-elements tuple (hdt.TripleIDIterator, estimated pattern cardinality), where the TripleIDIterator iterates over matching RDF triples IDs.

A RDF triple ID itself is a 3-elements tuple (subjectID, predicateID, objectID).

from hdt import HDTDocument
document = HDTDocument("test.hdt")

pred = document.convert_term("http://xmlns.com/foaf/0.1/")
# Fetch all RDF triples that matches { ?s foaf:name ?o }
(triples, cardinality) = document.search_triples_ids(0, pred, 0)

print("cardinality of { ?s foaf:name ?o }: %i" % cardinality)
for triple in triples:
  print(triple)
property total_triples

Return the total number of triples in the HDT document

TripleIterator

class hdt.TripleIterator

A TripleIterator iterates over triples in a HDT file matching a triple pattern, with an optional limit & offset.

Such iterator is returned by hdt.HDTDocument.search_triples().

has_next(self: hdt.TripleIterator) → bool

Return true if the iterator still has items to yield, false otherwise.

property limit

Return the limit of the iterator, i.e., the maximum number of items the iterator will yield. A limit of 0 indicates that the iterator limit is the cardinality of the triple pattern currently evaluated.

property nb_reads

Return the number of items read by the iterator until now. Do not include any offset, thus the real position of the iterator in the collection of triples can be computed as offset + nb_reads

next(self: hdt.TripleIterator) → Tuple[str, str, str]

Return the next matching triple read by the iterator, or raise StopIterator if there is no more items to yield.

property object

Return the object of the triple pattern currently evaluated.

property offset

Return the offset of the iterator, i.e., the number of items the iterator will first skip before yielding. An offset of 0 indicates that the iterator will not skip any items.

peek(self: hdt.TripleIterator) → Tuple[str, str, str]

Return the next matching triple read by the iterator without advancing it, or raise StopIterator if there is no more items to yield.

property predicate

Return the predicate of the triple pattern currently evaluated.

size_hint(self: hdt.TripleIterator) → Tuple[int, bool]

Get a hint on the cardinality of the triple pattern currently evaluated. The iterator’s limit and offset are not taken into account.

Return:

A 2-element tuple (integer, boolean), where the left member is the estimated cardinality, and the right member is True is the estimation is accurate, False otherwise

property subject

Return the subject of the triple pattern currently evaluated.

TripleIDIterator

class hdt.TripleIDIterator

A TripleIDIterator iterates over triples’ IDs in a HDT file matching a triple pattern, with an optional limit & offset.

Such iterator is returned by hdt.HDTDocument.search_triples_ids()

Conversion from a tuple of triple ids into a RDF triple is done using hdt.HDTDocument.convert_tripleid().

has_next(self: hdt.TripleIDIterator) → bool

Return true if the iterator still has items to yield, false otherwise.

property limit

Return the limit of the iterator, i.e., the maximum number of items the iterator will yield. A limit of 0 indicates that the iterator limit is the cardinality of the triple pattern currently evaluated.

property nb_reads

Return the number of items read by the iterator until now. Do not include any offset, thus the real position of the iterator in the collection of triples can be computed as offset + nb_reads

next(self: hdt.TripleIDIterator) → Tuple[int, int, int]

Return the next matching triple read by the iterator, or raise StopIterator if there is no more items to yield.

property object

Return the object of the triple pattern currently evaluated.

property offset

Return the offset of the iterator, i.e., the number of items the iterator will first skip before yielding. An offset of 0 indicates that the iterator will not skip any items.

peek(self: hdt.TripleIDIterator) → Tuple[int, int, int]

Return the next matching triple read by the iterator without advancing it, or raise StopIterator if there is no more items to yield.

property predicate

Return the predicate of the triple pattern currently evaluated.

size_hint(self: hdt.TripleIDIterator) → Tuple[int, bool]

Get a hint on the cardinality of the triple pattern currently evaluated. The iterator’s limit and offset are not taken into account.

Return:

A 2-element tuple (integer, boolean), where the left member is the estimated cardinality, and the right member is True is the estimation is accurate, False otherwise

property subject

Return the subject of the triple pattern currently evaluated.

JoinIterator

class hdt.JoinIterator

A JoinIterator iterates over the set of solution mappings for a join between several triple patterns. It implements the Python iterator protocol and yields sets of solutions mappings.

Such iterator is returned by hdt.HDTDocument.search_join()

cardinality(self: hdt.JoinIterator) → int

Return the estimated join cardinality.

has_next(self: hdt.JoinIterator) → bool

Return true if the iterator still has items to yield, false otherwise.

next(self: hdt.JoinIterator) → Set[Tuple[str, str]]

Return the next set of solution mappings read by the iterator, or raise StopIterator if there is no more items to yield.

reset(self: hdt.JoinIterator) → None

Reset the join, i.e., move the iterator back to its initial state.

Enumerations

IdentifierPosition

class hdt.IdentifierPosition

An enum used to indicate the position (subject, predicate or object) of an Object identifier.

Possibles values:
  • IdentifierPosition.Subject: the subject position

  • IdentifierPosition.Predicate: the subject position

  • IdentifierPosition.Object: the object position

from hdt import IdentifierPosition
print(IdentifierPosition.Subject)
print(IdentifierPosition.Predicate)
print(IdentifierPosition.Object)