API documentation¶

HDTDocument¶

class hdt.HDTDocument¶

An HDTDocument enables to load and query a HDT file.

Constructor:

file str: Path to the HDT file to load.
predicate boolean: True if additional indexes must be loaded, False otherwise.

__init__(self, filePath) → hdt.HDTDocument¶

Build a new hdt.HDTDocument by the loading the HDT file located in filePath.

Args:

filePath str: the path to the HDT file to load.

from hdt import HDTDocument

# Load HDT file. Missing indexes are generated automatically
document = HDTDocument("test.hdt")

# Display some metadata about the HDT document itself
print("nb triples: %i" % document.total_triples)
print("nb subjects: %i" % document.nb_subjects)
print("nb predicates: %i" % document.nb_predicates)
print("nb objects: %i" % document.nb_objets)
print("nb shared subject-object: %i" % document.nb_shared)

convert_id(self: hdt.HDTDocument, id: int, position: hdt.IdentifierPosition) → str¶

Transform an Object Identifier to a RDF term. Such identifier are used in TripleID.

Args:

id int: Object identifier.
position hdt.IdentifierPosition: Identifier position.

Return:

The RDF term associated with the Object Identifier, i.e., either an URI or a RDF literal.

from hdt import HDTDocument, IdentifierPosition
document = HDTDocument("test.hdt")
print(document.convert_id(10, IdentifierPosition.Subject))

convert_id_bytes(self: hdt.HDTDocument, id: int, position: hdt.IdentifierPosition) → bytes¶

Transform an Object Identifier to a RDF term. Such identifier are used in TripleID.

Args:

id int: Object identifier.
position hdt.IdentifierPosition: Identifier position.

Return:

The RDF term associated with the Object Identifier, i.e., either an URI or a RDF literal.

from hdt import HDTDocument, IdentifierPosition
document = HDTDocument("test.hdt")
print(document.convert_id(10, IdentifierPosition.Subject))

convert_term(self: hdt.HDTDocument, term: str, position: hdt.IdentifierPosition) → int¶

Transform an RDF Term to the associated Object Identifier. Such identifier are used in TripleID.

Args:

term str: RDF Term.
position hdt.IdentifierPosition: Identifier position.

Return:

The Object Identifier associated with the RDF Term

from hdt import HDTDocument, IdentifierPosition
document = HDTDocument("test.hdt")
print(document.convert_term("http://example.org#Alice", IdentifierPosition.Subject))

convert_tripleid(self: hdt.HDTDocument, subject: int, predicate: int, object: int) → Tuple[str, str, str]¶

Transform a RDF triple from a TripleID representation to a string representation.

Args:

subject int: unique ID of the subject.
predicate int: unique ID of the predicate.
obj int: unique ID of the object.

Return:

A triple in string representation, i.e., a 3-elements tuple (subject, predicate, object)

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# Fetch all triples that matches { ?s foaf:name ?o }
pred = document.convert_term("http://xmlns.com/foaf/0.1/")
(triples, cardinality) = document.search_triples_ids(0, pred, 0)

for s, p, o in triples:
  print(s, p, o) # will print Object identifiers, i.e., integers
  # convert a triple ID to a string format
  print(document.convert_tripleid(s, p, o))

convert_tripleid_bytes(self: hdt.HDTDocument, subject: int, predicate: int, object: int) → Tuple[bytes, bytes, bytes]¶

Transform a RDF triple from a TripleID representation to a string representation.

Args:

subject int: unique ID of the subject.
predicate int: unique ID of the predicate.
obj int: unique ID of the object.

Return:

A triple in string representation, i.e., a 3-elements tuple (subject, predicate, object)

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# Fetch all triples that matches { ?s foaf:name ?o }
pred = document.convert_term("http://xmlns.com/foaf/0.1/")
(triples, cardinality) = document.search_triples_ids(0, pred, 0)

for s, p, o in triples:
  print(s, p, o) # will print Object identifiers, i.e., integers
  # convert a triple ID to a string format
  print(document.convert_tripleid(s, p, o))

property file_path¶: Return the path to the HDT file currently loaded

property nb_objects¶: Return the number of objects in the HDT document

property nb_predicates¶: Return the number of predicates in the HDT document

property nb_shared¶: Return the number of shared subject-object in the HDT document

property nb_subjects¶: Return the number of subjects in the HDT document

search_join(self: hdt.HDTDocument, patterns: List[Tuple[str, str, str]]) → hdt.JoinIterator¶

Evaluate a join between a set of triple patterns using an iterator. A triple pattern itself is a 3-elements tuple (subject, predicate, object), where SPARQL variables, i.e., join predicates, are prefixed by a ?.

Args:

patterns set: set of triple patterns.

Return:

A hdt.JoinIterator, which can be consumed as a Python iterator to evaluates the join.

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# find all actors with their names in the HDT document
tp_a = ("?s", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://example.org#Actor")
tp_b = ("?s", "http://xmlns.com/foaf/0.1/name", "?name")
iterator = document.search_join(set([tp_a, tp_b]))

print("estimated join cardinality : %i" % len(iterator))
for mappings in iterator:
  print(mappings)

search_join_bytes(self: hdt.HDTDocument, patterns: List[Tuple[str, str, str]]) → hdt.JoinIteratorBytes¶

Evaluate a join between a set of triple patterns using an iterator. A triple pattern itself is a 3-elements tuple (subject, predicate, object), where SPARQL variables, i.e., join predicates, are prefixed by a ?.

Args:

patterns set: set of triple patterns.

Return:

A hdt.JoinIterator, which can be consumed as a Python iterator to evaluates the join.

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# find all actors with their names in the HDT document
tp_a = ("?s", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://example.org#Actor")
tp_b = ("?s", "http://xmlns.com/foaf/0.1/name", "?name")
iterator = document.search_join(set([tp_a, tp_b]))

print("estimated join cardinality : %i" % len(iterator))
for mappings in iterator:
  print(mappings)

search_triples(self: hdt.HDTDocument, subject: str, predicate: str, object: str, limit: int=0, offset: int=0) → Tuple[hdt.TripleIterator, int]¶

Search for RDF triples matching the triple pattern { subject predicate object }, with an optional limit and offset. Use empty strings ("") to indicate wildcards.

Args:

subject str: The subject of the triple pattern to seach for.
predicate str: The predicate of the triple pattern to seach for.
obj str: The object of the triple pattern ot seach for.
limit int optional: Maximum number of triples to search for.
offset int optional: Number of matching triples to skip before returning results.

Return:

A 2-elements tuple (hdt.TripleIterator, estimated pattern cardinality), where the TripleIterator iterates over matching RDF triples.

A RDF triple itself is a 3-elements tuple (subject, predicate, object).

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# Fetch all triples that matches { ?s ?p ?o }
(triples, cardinality) = document.search_triples("", "", "")

print("cardinality of { ?s ?p ?o }: %i" % cardinality)
for triple in triples:
  print(triple)

search_triples_bytes(self: hdt.HDTDocument, subject: str, predicate: str, object: str, limit: int=0, offset: int=0) → Tuple[hdt.TripleIteratorBytes, int]¶

Search for RDF triples matching the triple pattern { subject predicate object }, with an optional limit and offset. Use empty strings ("") to indicate wildcards.

Args:

subject str: The subject of the triple pattern to seach for.
predicate str: The predicate of the triple pattern to seach for.
obj str: The object of the triple pattern ot seach for.
limit int optional: Maximum number of triples to search for.
offset int optional: Number of matching triples to skip before returning results.

Return:

A 2-elements tuple (hdt.TripleIterator, estimated pattern cardinality), where the TripleIterator iterates over matching RDF triples.

A RDF triple itself is a 3-elements tuple (subject, predicate, object).

from hdt import HDTDocument
document = HDTDocument("test.hdt")

# Fetch all triples that matches { ?s ?p ?o }
(triples, cardinality) = document.search_triples("", "", "")

print("cardinality of { ?s ?p ?o }: %i" % cardinality)
for triple in triples:
  print(triple)

search_triples_ids(self: hdt.HDTDocument, subject: int, predicate: int, object: int, limit: int=0, offset: int=0) → Tuple[hdt.TripleIDIterator, int]¶

Same as hdt.HDTDocument.search_triples(), but RDF triples are represented as unique ids (from the HDT Dictionnary). Use the integer 0 to indicate wildcards.

Mapping between ids and RDF terms is done using hdt.HDTDocument.convert_id(), hdt.HDTDocument.convert_term() and hdt.HDTDocument.convert_tripleid().

Args:

subject int: The Object identifier of the triple pattern’s subject.
predicate int: The Object identifier of the triple pattern’s predicate.
obj int: The Object identifier of the triple pattern’s object.
limit int optional: Maximum number of triples to search for.
offset int optional: Number of matching triples to skip before returning results.

Return:

A 2-elements tuple (hdt.TripleIDIterator, estimated pattern cardinality), where the TripleIDIterator iterates over matching RDF triples IDs.

A RDF triple ID itself is a 3-elements tuple (subjectID, predicateID, objectID).

from hdt import HDTDocument
document = HDTDocument("test.hdt")

pred = document.convert_term("http://xmlns.com/foaf/0.1/")
# Fetch all RDF triples that matches { ?s foaf:name ?o }
(triples, cardinality) = document.search_triples_ids(0, pred, 0)

print("cardinality of { ?s foaf:name ?o }: %i" % cardinality)
for triple in triples:
  print(triple)

property total_triples¶: Return the total number of triples in the HDT document

TripleIterator¶

class hdt.TripleIterator¶

A TripleIterator iterates over triples in a HDT file matching a triple pattern, with an optional limit & offset.

Such iterator is returned by hdt.HDTDocument.search_triples().

has_next(self: hdt.TripleIterator) → bool¶: Return true if the iterator still has items to yield, false otherwise.

property limit¶: Return the limit of the iterator, i.e., the maximum number of items the iterator will yield. A limit of 0 indicates that the iterator limit is the cardinality of the triple pattern currently evaluated.

property nb_reads¶: Return the number of items read by the iterator until now. Do not include any offset, thus the real position of the iterator in the collection of triples can be computed as offset + nb_reads

next(self: hdt.TripleIterator) → Tuple[str, str, str]¶: Return the next matching triple read by the iterator, or raise StopIterator if there is no more items to yield.

property object¶: Return the object of the triple pattern currently evaluated.

property offset¶: Return the offset of the iterator, i.e., the number of items the iterator will first skip before yielding. An offset of 0 indicates that the iterator will not skip any items.

peek(self: hdt.TripleIterator) → Tuple[str, str, str]¶: Return the next matching triple read by the iterator without advancing it, or raise StopIterator if there is no more items to yield.

property predicate¶: Return the predicate of the triple pattern currently evaluated.

size_hint(self: hdt.TripleIterator) → Tuple[int, bool]¶

Get a hint on the cardinality of the triple pattern currently evaluated. The iterator’s limit and offset are not taken into account.

Return:: A 2-element tuple (integer, boolean), where the left member is the estimated cardinality, and the right member is True is the estimation is accurate, False otherwise

property subject¶: Return the subject of the triple pattern currently evaluated.

TripleIDIterator¶

class hdt.TripleIDIterator¶

A TripleIDIterator iterates over triples’ IDs in a HDT file matching a triple pattern, with an optional limit & offset.

Such iterator is returned by hdt.HDTDocument.search_triples_ids()

Conversion from a tuple of triple ids into a RDF triple is done using hdt.HDTDocument.convert_tripleid().

has_next(self: hdt.TripleIDIterator) → bool¶: Return true if the iterator still has items to yield, false otherwise.

property limit¶: Return the limit of the iterator, i.e., the maximum number of items the iterator will yield. A limit of 0 indicates that the iterator limit is the cardinality of the triple pattern currently evaluated.

property nb_reads¶: Return the number of items read by the iterator until now. Do not include any offset, thus the real position of the iterator in the collection of triples can be computed as offset + nb_reads

next(self: hdt.TripleIDIterator) → Tuple[int, int, int]¶: Return the next matching triple read by the iterator, or raise StopIterator if there is no more items to yield.

property object¶: Return the object of the triple pattern currently evaluated.

property offset¶: Return the offset of the iterator, i.e., the number of items the iterator will first skip before yielding. An offset of 0 indicates that the iterator will not skip any items.

peek(self: hdt.TripleIDIterator) → Tuple[int, int, int]¶: Return the next matching triple read by the iterator without advancing it, or raise StopIterator if there is no more items to yield.

property predicate¶: Return the predicate of the triple pattern currently evaluated.

size_hint(self: hdt.TripleIDIterator) → Tuple[int, bool]¶

Get a hint on the cardinality of the triple pattern currently evaluated. The iterator’s limit and offset are not taken into account.

Return:: A 2-element tuple (integer, boolean), where the left member is the estimated cardinality, and the right member is True is the estimation is accurate, False otherwise

property subject¶: Return the subject of the triple pattern currently evaluated.

JoinIterator¶

class hdt.JoinIterator¶

A JoinIterator iterates over the set of solution mappings for a join between several triple patterns. It implements the Python iterator protocol and yields sets of solutions mappings.

Such iterator is returned by hdt.HDTDocument.search_join()

cardinality(self: hdt.JoinIterator) → int¶: Return the estimated join cardinality.

has_next(self: hdt.JoinIterator) → bool¶: Return true if the iterator still has items to yield, false otherwise.

next(self: hdt.JoinIterator) → Set[Tuple[str, str]]¶: Return the next set of solution mappings read by the iterator, or raise StopIterator if there is no more items to yield.

reset(self: hdt.JoinIterator) → None¶: Reset the join, i.e., move the iterator back to its initial state.

Enumerations¶

IdentifierPosition¶

class hdt.IdentifierPosition¶

An enum used to indicate the position (subject, predicate or object) of an Object identifier.

Possibles values:

IdentifierPosition.Subject: the subject position
IdentifierPosition.Predicate: the subject position
IdentifierPosition.Object: the object position

from hdt import IdentifierPosition
print(IdentifierPosition.Subject)
print(IdentifierPosition.Predicate)
print(IdentifierPosition.Object)