API documentation¶
HDTDocument¶
-
class
hdt.
HDTDocument
¶ An HDTDocument enables to load and query a HDT file.
- Constructor:
file
str
: Path to the HDT file to load.predicate
boolean
: True if additional indexes must be loaded, False otherwise.
-
__init__
(self, filePath) → hdt.HDTDocument¶ Build a new
hdt.HDTDocument
by the loading the HDT file located infilePath
.- Args:
filePath
str
: the path to the HDT file to load.
from hdt import HDTDocument # Load HDT file. Missing indexes are generated automatically document = HDTDocument("test.hdt") # Display some metadata about the HDT document itself print("nb triples: %i" % document.total_triples) print("nb subjects: %i" % document.nb_subjects) print("nb predicates: %i" % document.nb_predicates) print("nb objects: %i" % document.nb_objets) print("nb shared subject-object: %i" % document.nb_shared)
-
convert_id
(self: hdt.HDTDocument, id: int, position: hdt.IdentifierPosition) → str¶ Transform an Object Identifier to a RDF term. Such identifier are used in TripleID.
- Args:
id
int
: Object identifier.position
hdt.IdentifierPosition
: Identifier position.
- Return:
The RDF term associated with the Object Identifier, i.e., either an URI or a RDF literal.
from hdt import HDTDocument, IdentifierPosition document = HDTDocument("test.hdt") print(document.convert_id(10, IdentifierPosition.Subject))
-
convert_id_bytes
(self: hdt.HDTDocument, id: int, position: hdt.IdentifierPosition) → bytes¶ Transform an Object Identifier to a RDF term. Such identifier are used in TripleID.
- Args:
id
int
: Object identifier.position
hdt.IdentifierPosition
: Identifier position.
- Return:
The RDF term associated with the Object Identifier, i.e., either an URI or a RDF literal.
from hdt import HDTDocument, IdentifierPosition document = HDTDocument("test.hdt") print(document.convert_id(10, IdentifierPosition.Subject))
-
convert_term
(self: hdt.HDTDocument, term: str, position: hdt.IdentifierPosition) → int¶ Transform an RDF Term to the associated Object Identifier. Such identifier are used in TripleID.
- Args:
term
str
: RDF Term.position
hdt.IdentifierPosition
: Identifier position.
- Return:
The Object Identifier associated with the RDF Term
from hdt import HDTDocument, IdentifierPosition document = HDTDocument("test.hdt") print(document.convert_term("http://example.org#Alice", IdentifierPosition.Subject))
-
convert_tripleid
(self: hdt.HDTDocument, subject: int, predicate: int, object: int) → Tuple[str, str, str]¶ Transform a RDF triple from a TripleID representation to a string representation.
- Args:
subject
int
: unique ID of the subject.predicate
int
: unique ID of the predicate.obj
int
: unique ID of the object.
- Return:
A triple in string representation, i.e., a 3-elements
tuple
(subject, predicate, object)from hdt import HDTDocument document = HDTDocument("test.hdt") # Fetch all triples that matches { ?s foaf:name ?o } pred = document.convert_term("http://xmlns.com/foaf/0.1/") (triples, cardinality) = document.search_triples_ids(0, pred, 0) for s, p, o in triples: print(s, p, o) # will print Object identifiers, i.e., integers # convert a triple ID to a string format print(document.convert_tripleid(s, p, o))
-
convert_tripleid_bytes
(self: hdt.HDTDocument, subject: int, predicate: int, object: int) → Tuple[bytes, bytes, bytes]¶ Transform a RDF triple from a TripleID representation to a string representation.
- Args:
subject
int
: unique ID of the subject.predicate
int
: unique ID of the predicate.obj
int
: unique ID of the object.
- Return:
A triple in string representation, i.e., a 3-elements
tuple
(subject, predicate, object)from hdt import HDTDocument document = HDTDocument("test.hdt") # Fetch all triples that matches { ?s foaf:name ?o } pred = document.convert_term("http://xmlns.com/foaf/0.1/") (triples, cardinality) = document.search_triples_ids(0, pred, 0) for s, p, o in triples: print(s, p, o) # will print Object identifiers, i.e., integers # convert a triple ID to a string format print(document.convert_tripleid(s, p, o))
-
property
file_path
¶ Return the path to the HDT file currently loaded
-
property
nb_objects
¶ Return the number of objects in the HDT document
-
property
nb_predicates
¶ Return the number of predicates in the HDT document
Return the number of shared subject-object in the HDT document
-
property
nb_subjects
¶ Return the number of subjects in the HDT document
-
search_join
(self: hdt.HDTDocument, patterns: List[Tuple[str, str, str]]) → hdt.JoinIterator¶ Evaluate a join between a set of triple patterns using an iterator. A triple pattern itself is a 3-elements
tuple
(subject, predicate, object), where SPARQL variables, i.e., join predicates, are prefixed by a?
.- Args:
patterns
set
: set of triple patterns.
- Return:
A
hdt.JoinIterator
, which can be consumed as a Python iterator to evaluates the join.from hdt import HDTDocument document = HDTDocument("test.hdt") # find all actors with their names in the HDT document tp_a = ("?s", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://example.org#Actor") tp_b = ("?s", "http://xmlns.com/foaf/0.1/name", "?name") iterator = document.search_join(set([tp_a, tp_b])) print("estimated join cardinality : %i" % len(iterator)) for mappings in iterator: print(mappings)
-
search_join_bytes
(self: hdt.HDTDocument, patterns: List[Tuple[str, str, str]]) → hdt.JoinIteratorBytes¶ Evaluate a join between a set of triple patterns using an iterator. A triple pattern itself is a 3-elements
tuple
(subject, predicate, object), where SPARQL variables, i.e., join predicates, are prefixed by a?
.- Args:
patterns
set
: set of triple patterns.
- Return:
A
hdt.JoinIterator
, which can be consumed as a Python iterator to evaluates the join.from hdt import HDTDocument document = HDTDocument("test.hdt") # find all actors with their names in the HDT document tp_a = ("?s", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://example.org#Actor") tp_b = ("?s", "http://xmlns.com/foaf/0.1/name", "?name") iterator = document.search_join(set([tp_a, tp_b])) print("estimated join cardinality : %i" % len(iterator)) for mappings in iterator: print(mappings)
-
search_triples
(self: hdt.HDTDocument, subject: str, predicate: str, object: str, limit: int=0, offset: int=0) → Tuple[hdt.TripleIterator, int]¶ Search for RDF triples matching the triple pattern {
subject
predicate
object
}, with an optionallimit
andoffset
. Use empty strings (""
) to indicate wildcards.- Args:
subject
str
: The subject of the triple pattern to seach for.predicate
str
: The predicate of the triple pattern to seach for.obj
str
: The object of the triple pattern ot seach for.limit
int
optional
: Maximum number of triples to search for.offset
int
optional
: Number of matching triples to skip before returning results.
- Return:
A 2-elements
tuple
(hdt.TripleIterator
, estimated pattern cardinality), where the TripleIterator iterates over matching RDF triples.A RDF triple itself is a 3-elements
tuple
(subject, predicate, object).from hdt import HDTDocument document = HDTDocument("test.hdt") # Fetch all triples that matches { ?s ?p ?o } (triples, cardinality) = document.search_triples("", "", "") print("cardinality of { ?s ?p ?o }: %i" % cardinality) for triple in triples: print(triple)
-
search_triples_bytes
(self: hdt.HDTDocument, subject: str, predicate: str, object: str, limit: int=0, offset: int=0) → Tuple[hdt.TripleIteratorBytes, int]¶ Search for RDF triples matching the triple pattern {
subject
predicate
object
}, with an optionallimit
andoffset
. Use empty strings (""
) to indicate wildcards.- Args:
subject
str
: The subject of the triple pattern to seach for.predicate
str
: The predicate of the triple pattern to seach for.obj
str
: The object of the triple pattern ot seach for.limit
int
optional
: Maximum number of triples to search for.offset
int
optional
: Number of matching triples to skip before returning results.
- Return:
A 2-elements
tuple
(hdt.TripleIterator
, estimated pattern cardinality), where the TripleIterator iterates over matching RDF triples.A RDF triple itself is a 3-elements
tuple
(subject, predicate, object).from hdt import HDTDocument document = HDTDocument("test.hdt") # Fetch all triples that matches { ?s ?p ?o } (triples, cardinality) = document.search_triples("", "", "") print("cardinality of { ?s ?p ?o }: %i" % cardinality) for triple in triples: print(triple)
-
search_triples_ids
(self: hdt.HDTDocument, subject: int, predicate: int, object: int, limit: int=0, offset: int=0) → Tuple[hdt.TripleIDIterator, int]¶ Same as
hdt.HDTDocument.search_triples()
, but RDF triples are represented as unique ids (from the HDT Dictionnary). Use the integer 0 to indicate wildcards.Mapping between ids and RDF terms is done using
hdt.HDTDocument.convert_id()
,hdt.HDTDocument.convert_term()
andhdt.HDTDocument.convert_tripleid()
.- Args:
subject
int
: The Object identifier of the triple pattern’s subject.predicate
int
: The Object identifier of the triple pattern’s predicate.obj
int
: The Object identifier of the triple pattern’s object.limit
int
optional
: Maximum number of triples to search for.offset
int
optional
: Number of matching triples to skip before returning results.
- Return:
A 2-elements
tuple
(hdt.TripleIDIterator
, estimated pattern cardinality), where the TripleIDIterator iterates over matching RDF triples IDs.A RDF triple ID itself is a 3-elements
tuple
(subjectID, predicateID, objectID).from hdt import HDTDocument document = HDTDocument("test.hdt") pred = document.convert_term("http://xmlns.com/foaf/0.1/") # Fetch all RDF triples that matches { ?s foaf:name ?o } (triples, cardinality) = document.search_triples_ids(0, pred, 0) print("cardinality of { ?s foaf:name ?o }: %i" % cardinality) for triple in triples: print(triple)
-
property
total_triples
¶ Return the total number of triples in the HDT document
TripleIterator¶
-
class
hdt.
TripleIterator
¶ A TripleIterator iterates over triples in a HDT file matching a triple pattern, with an optional limit & offset.
Such iterator is returned by
hdt.HDTDocument.search_triples()
.-
has_next
(self: hdt.TripleIterator) → bool¶ Return true if the iterator still has items to yield, false otherwise.
-
property
limit
¶ Return the limit of the iterator, i.e., the maximum number of items the iterator will yield. A limit of 0 indicates that the iterator limit is the cardinality of the triple pattern currently evaluated.
-
property
nb_reads
¶ Return the number of items read by the iterator until now. Do not include any offset, thus the real position of the iterator in the collection of triples can be computed as offset + nb_reads
-
next
(self: hdt.TripleIterator) → Tuple[str, str, str]¶ Return the next matching triple read by the iterator, or raise
StopIterator
if there is no more items to yield.
-
property
object
¶ Return the object of the triple pattern currently evaluated.
-
property
offset
¶ Return the offset of the iterator, i.e., the number of items the iterator will first skip before yielding. An offset of 0 indicates that the iterator will not skip any items.
-
peek
(self: hdt.TripleIterator) → Tuple[str, str, str]¶ Return the next matching triple read by the iterator without advancing it, or raise
StopIterator
if there is no more items to yield.
-
property
predicate
¶ Return the predicate of the triple pattern currently evaluated.
-
size_hint
(self: hdt.TripleIterator) → Tuple[int, bool]¶ Get a hint on the cardinality of the triple pattern currently evaluated. The iterator’s limit and offset are not taken into account.
- Return:
A 2-element
tuple
(integer, boolean), where the left member is the estimated cardinality, and the right member is True is the estimation is accurate, False otherwise
-
property
subject
¶ Return the subject of the triple pattern currently evaluated.
-
TripleIDIterator¶
-
class
hdt.
TripleIDIterator
¶ A TripleIDIterator iterates over triples’ IDs in a HDT file matching a triple pattern, with an optional limit & offset.
Such iterator is returned by
hdt.HDTDocument.search_triples_ids()
Conversion from a tuple of triple ids into a RDF triple is done using
hdt.HDTDocument.convert_tripleid()
.-
has_next
(self: hdt.TripleIDIterator) → bool¶ Return true if the iterator still has items to yield, false otherwise.
-
property
limit
¶ Return the limit of the iterator, i.e., the maximum number of items the iterator will yield. A limit of 0 indicates that the iterator limit is the cardinality of the triple pattern currently evaluated.
-
property
nb_reads
¶ Return the number of items read by the iterator until now. Do not include any offset, thus the real position of the iterator in the collection of triples can be computed as offset + nb_reads
-
next
(self: hdt.TripleIDIterator) → Tuple[int, int, int]¶ Return the next matching triple read by the iterator, or raise
StopIterator
if there is no more items to yield.
-
property
object
¶ Return the object of the triple pattern currently evaluated.
-
property
offset
¶ Return the offset of the iterator, i.e., the number of items the iterator will first skip before yielding. An offset of 0 indicates that the iterator will not skip any items.
-
peek
(self: hdt.TripleIDIterator) → Tuple[int, int, int]¶ Return the next matching triple read by the iterator without advancing it, or raise
StopIterator
if there is no more items to yield.
-
property
predicate
¶ Return the predicate of the triple pattern currently evaluated.
-
size_hint
(self: hdt.TripleIDIterator) → Tuple[int, bool]¶ Get a hint on the cardinality of the triple pattern currently evaluated. The iterator’s limit and offset are not taken into account.
- Return:
A 2-element
tuple
(integer, boolean), where the left member is the estimated cardinality, and the right member is True is the estimation is accurate, False otherwise
-
property
subject
¶ Return the subject of the triple pattern currently evaluated.
-
JoinIterator¶
-
class
hdt.
JoinIterator
¶ A JoinIterator iterates over the set of solution mappings for a join between several triple patterns. It implements the Python iterator protocol and yields sets of solutions mappings.
Such iterator is returned by
hdt.HDTDocument.search_join()
-
cardinality
(self: hdt.JoinIterator) → int¶ Return the estimated join cardinality.
-
has_next
(self: hdt.JoinIterator) → bool¶ Return true if the iterator still has items to yield, false otherwise.
-
next
(self: hdt.JoinIterator) → Set[Tuple[str, str]]¶ Return the next set of solution mappings read by the iterator, or raise
StopIterator
if there is no more items to yield.
-
reset
(self: hdt.JoinIterator) → None¶ Reset the join, i.e., move the iterator back to its initial state.
-
Enumerations¶
IdentifierPosition¶
-
class
hdt.
IdentifierPosition
¶ An enum used to indicate the position (subject, predicate or object) of an Object identifier.
- Possibles values:
IdentifierPosition.Subject
: the subject positionIdentifierPosition.Predicate
: the subject positionIdentifierPosition.Object
: the object position
from hdt import IdentifierPosition print(IdentifierPosition.Subject) print(IdentifierPosition.Predicate) print(IdentifierPosition.Object)