hypermorph package¶
Submodules¶
hypermorph.clients module¶
-
class
hypermorph.clients.
ConnectionPool
(db_client, dialect=None, host=None, port=None, user=None, password=None, database=None, path=None, trace=0)¶ Bases:
object
ConnectionPool manages connections to DBMS and extends API database clients i) with useful introspective properties (api, database, sqlalchemy_engine, last_query, query_stats) ii) with a uniform sql command interface (sql method) iii) with common methods to access database metadata (get_tables_metadata, get_columns_metadata)
- HyperMorph currently supports the following three database API clients (self._api_name)
Clickhouse-Driver
MySQL-Connector
- SQLAlchemy with the following three dialects (self._sqlalchemy_dialect)
pymysql
clickhouse
sqlite
- Consequently various database APIs are categorized as (self._api_category)
MYSQL CLICKHOUSE SQLite
-
property
api_category
¶
-
property
api_name
¶
-
clickhouse_connections
= 0¶
-
property
connector
¶
-
property
database
¶
-
get_columns_metadata
(table=None, columns=None, fields=None, aggr=None, **kwargs)¶ - Parameters
table – name of the table in database
columns – list of ClickHouse column names
fields – select specific meta-data fields for the columns of a table in database dictionary metadata field names are dependent on the specific DBMS e.g. MySQL, SQLite, ClickHouse, etc…
aggr – aggregate metadata results for the columns of a clickhouse table
kwargs – pass extra parameters to sql() method
- Returns
metadata for the columns of a table(s) in a database e.g. name of column, default value, nullable, etc
-
get_tables_metadata
(fields=None, clickhouse_engine=None, name=None, **kwargs)¶ - Parameters
clickhouse_engine – type of storage engine for clickhouse database
fields – select specific meta-data fields for a table in database dictionary metadata field names are dependent on the specific DBMS e.g. MySQL, SQLite, ClickHouse, etc…
name – table name regular expression e.g. name=’%200%’
kwargs – parameters passed to sql() method
- Returns
metadata for the tables of a database e.g. name of table, number of rows, row length, collation, etc..
-
mysql_connections
= 0¶
-
sql
(query, **kwargs)¶ For kwargs and specific details on implementation see implementation of connector class for the specific API e.g. SQLAlchemy.sql() for sqlalchemy database API :param query: :param kwargs: pass other parameters to sql() method of connector class :return: result set represented with a pandas dataframe, tuples, ….
-
property
sqlalchemy_dialect
¶
-
sqlite_connections
= 0¶
hypermorph.connector_clickhouse_driver module¶
-
class
hypermorph.connector_clickhouse_driver.
ClickHouse
(host, port, user, password, database, trace=0)¶ Bases:
object
ClickHouse class is based on clickhouse-driver python API for ClickHouse DBMS
-
property
api_category
¶
-
property
connection
¶
-
create_engine
(table, engine, heading, partition_key=None, order_key=None, settings=None, execute=True)¶ - Parameters
table – name of ClickHouse engine
engine – the type of clickhouse engine
settings – clickhouse engine settings
heading –
list of field names paired with clickhouse data types [ (‘fld1_name’, ‘dtype1’) ,
(‘fld2_name’, ‘dtype2’,) ( -//- , -//- ) (‘fldN_name’, ‘dtypeN’ )
partition_key –
order_key –
execute –
- Returns
-
property
cursor
¶
-
disconnect
()¶
-
get_columns
(table=None, columns=None, fields=None, aggr=None, **kwargs)¶ - Parameters
table – ClickHouse table name
columns – list of ClickHouse column names
fields – Metadata fields for columns
aggr – aggregate metadata results for the columns of a clickhouse table
- Returns
metadata for clickhouse columns
-
get_mutations
(table, limit=None, group_by=None, execute=True)¶ - Parameters
table – clickhouse table
group_by –
limit – SQL limit
execute –
- Returns
-
get_parts
(table, hb2=None, active=True, execute=True)¶ - Parameters
table – clickhouse table
hb2 – select parts with a specific hb2 dimension (hb2 is the dim2 of the Entity/ASET key) default hb2=’%’
active – select only active parts
execute – Execute the command only if execute=True
- Returns
information about parts of MergeTree tables
-
get_query_log
(execute=True)¶
-
property
last_query_statistics
¶
-
optimize_engine
(table, execute=True)¶
-
property
print_query_statistics
¶
-
sql
(sql, out='dataframe', as_columns=None, index=None, partition_size=None, arrow_encoding=True, params=None, qid=None, execute=True, trace=None)¶ This method is calling clickhouse-driver execute() method to execute sql query Connection has already been established. :param sql: clickhouse SQL query string that will be send to server :param out: output format, i.e. python data structure that will represent the result set
dataframe, tuples, json_rows
- Parameters
as_columns – user specified column names for pandas dataframe, (list of strings, or comma separated string)
index – pandas dataframe columns
arrow_encoding – PyArrow columnar dictionary encoding
arrow_table – Output is PyArrow Table, otherwise it is a PyArrow RecordBatch
partition_size – ToDo number of records to use for each partition or target size of each partition, in bytes
params – clickhouse-client execute parameters
qid – query identifier. If no query id specified ClickHouse server will generate it
execute – execute SQL commands only if execute=True
trace – trace execution of query, i.e. print query, ellapsed time, rows in set, etc….
- Returns
result set formatted according to the out parameter
-
property
hypermorph.connector_mysql module¶
-
class
hypermorph.connector_mysql.
MySQL
(host, port, user, password, database, trace=0)¶ Bases:
object
-
property
api_category
¶
-
close
()¶
-
property
connection
¶
-
property
cursor
¶
-
property
last_query
¶
-
set_cursor
(buffered=True, raw=None, dictionary=None, named_tuple=None)¶
-
sql
(sql, out='dataframe', as_columns=None, index=None, partition_size=None, arrow_encoding=True, execute=True, buffered=True, trace=None)¶ This method is calling the cursor.execute() method of mysql.connector to execute sql query Connection has already been established. :param sql: mysql query string that will be send to server :param out: output format, i.e. python data structure that will represent the result set
dataframe, tuples, named_tuples, json_rows
- Parameters
partition_size – ToDo number of records to use for each partition or target size of each partition, in bytes
arrow_encoding – PyArrow columnar dictionary encoding
as_columns – user specified column names for pandas dataframe (list of strings, or comma separated string)
index – column names to be used in pandas dataframe index
execute – execute SQL commands only if execute=True
trace – trace execution of query, i.e. print query, ellapsed time, rows in set, etc….
buffered – MySQLCursorBuffered cursor fetches the entire result set from the server and buffers the rows. For nonbuffered cursors, rows are not fetched from the server until a row-fetching method is called.
- Returns
result set formatted according to the out parameter
For more details about MySQLCursor class execution see https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor.html
-
property
hypermorph.connector_sqlalchemy module¶
-
class
hypermorph.connector_sqlalchemy.
SQLAlchemy
(dialect=None, host=None, port=None, user=None, password=None, database=None, path=None, trace=0)¶ Bases:
object
-
property
api_category
¶
-
property
connection
¶
-
property
cursor
¶
-
property
engine
¶
-
property
last_query
¶
-
property
last_query_stats
¶
-
sql
(sql, out='dataframe', execute=True, trace=None, arrow_encoding=True, as_columns=None, index=None, partition_size=None, **kwargs)¶ - Parameters
sql – sql query string that will be send to server
out – output format e.g. dataframe, tuples, ….
execute – flag to enable execution of SQL statement
trace – trace execution of query, i.e. print query, ellapsed time, rows in set, etc….
partition_size – number of records to use for each partition or target size of each partition, in bytes
arrow_encoding – PyArrow columnar dictionary encoding
as_columns – user specified column names for pandas dataframe (list of strings, or comma separated string)
index – column names to be used in pandas dataframe index
kwargs – parameters passed to pandas.read_sql() method https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html
- Returns
result formatted according to the out parameter
-
property
hypermorph.data_graph module¶
-
class
hypermorph.data_graph.
GData
(rebuild=False, load=False, **graph_properties)¶ Bases:
object
GData class represents ABox “assertion components” — a fact associated with a terminological vocabulary Such a fact is an instance of HB-HAs association ABox are TBox-compliant statements about that vocabulary Each instance of HB-HAs association is compliant with the model (schema) of Entity-Attributes
GData of HyperMorph are represented with a directed graph that is based on graph_tool python module. GData is composed from DataNodes and DataEdges. Each DataEdge links two DataNodes and we define a direction convention from a tail DataNode to a head DataNode.
HyperAtom, HyperBond classes are derived from DataNode class
GData of HyperMorph is a hypergraph defined by two sets of objects (a.k.a hyper-atoms HAs & hyper-bonds HBs) If we have ‘hyper-bonds’ HB={hb1, hb2, hb3} and ‘hyper-atoms’ B={ha1, ha2, ha3} then we can make a map such as d = {hn1: (ha1, ha2), hb2: (ha2), hb3: (ha1, ha2, ha3)} G(HB, HA, d) is the hypergraph
-
add_edge
(from_vertex, to_vertex)¶ Used in GDataLink to create a new instance of an edge :param from_vertex: tail vertex :param to_vertex: head vertex :return: an edge of GData Graph
-
add_hyperlinks
(hlinks, hb2=10000)¶ Method updates the GData graph with vertices (nodes) and edges (links) that are related to parameter hlinks It also adds dim2, dim1 and ntype vertex properties
- Parameters
hb2 – dim2 value for hyperbonds, it is set at a high enough value to filter them later on in the graph of data
:param hlinks is a list of hyperlinks in the form [ ((hb2, hb1), (ha2, ha1)), ….., ((hb2, hb1), (ha2, ha1))] A hyperlink is defined as the edge that connect a HyperBond with a HyperAtom, i.e. HB(hb2, hb1) —> HA(ha2, ha1)
In a table of data (hb2) with pk=hb1 is associated (linked) to a column of data (ha2) with indices (ha1) hb2 uint16>10000 represents a data table hb1 uint32 represents a data table row or pk index ha2 uint16<10000 represents a column of the data table ha1 uint32 represents a unique value, secondary index value of the specific column (ha2)
Therefore the set of hyperlinks (HBi —> HA1, HBi —> HA2, HBi —> HAn) transforms the tuple of a Relation to an association between the table row and the column values, indices
This association is graphically represented on a hypergraph with a hyperedge (HB) that connects many hypernodes (HAs)
-
add_link
(from_node, to_node)¶ - Parameters
from_node – tail node is a GDataNode object or node ID
to_node – head node is a GDataNode object or node ID
If there isn’t a link from node, to node it will try to create a new one, otherwise it will return an existing GDataLink instance
- Returns
GDataLink object, i.e. an edge of the GData graph
-
add_node
(**nprops)¶ - Parameters
nprops – GData node (vertex) properties
- Returns
HyperBond object
-
add_values
(string_values, hb2=10000)¶ Create and set a value vertex property :param hb2: dim2 value for hyperbonds,
it is set at a high enough value to filter them later on in the graph of data
- Parameters
string_values – string_repr string representation of ha2 column UNIQUE data values with a NumPy array of dtype=str
- Returns
-
add_vertex
(**vprops)¶ Used in GDataNode to create a new instance of a node :param vprops: GData vertex properties :return: a vertex of GData Graph
-
add_vertices
(n)¶
-
at
(dim2, dim1)¶ - Parameters
dim2 – ha2 dimension of hyperatom or hb2 dimension of hyperbond
dim1 – ha1 dimension of hyperatom or hb1 dimension of hyperbond
- Returns
the node of the graph with the specific dimensions
-
property
dim1
¶
-
property
dim2
¶
-
get
(nid)¶ - Parameters
nid – Node ID (vertex id)
- Returns
GDataNode object
-
get_node_by_id
(nid)¶ - Parameters
nid – node ID (vertex id)
- Returns
GDataNode object from the derived class, i.e. HyperAtom, HyperBond object see class_dict
-
get_node_by_key
(dim2, dim1)¶ - Parameters
dim2 –
dim1 –
- Returns
object with the specific key
-
get_vp
(vp_name)¶ - Parameters
vp_name – vertex property name
- Returns
VertexPropertyMap object
-
get_vp_value
(vp_name, vid)¶ - Parameters
vp_name – vertex property name
vid – either vertex object or vertex index (node id)
- Returns
the value of vertex property on the specific vertex of the graph
-
get_vp_values
(vp_name, filtered=False)¶
-
property
graph
¶
-
property
graph_properties
¶
-
property
graph_view
¶
-
property
is_filtered
¶
-
property
is_view_filtered
¶
-
property
list_properties
¶
-
property
net_alias
¶
-
property
net_descr
¶
-
property
net_edges
¶
-
property
net_format
¶
-
property
net_name
¶
-
property
net_path
¶
-
property
net_tool
¶
-
property
net_type
¶
-
property
ntype
¶
-
save_graph
()¶ Save HyperMorph GData._graph using the self._net_name, self._net_path and self._net_format
-
set_filter
(vmask, inverted=False)¶ This is filtering DGraph._graph instance Only the vertices with value different than False are kept in the filtered graph
- Parameters
vmask – boolean mask for the vertices of the graph
inverted – if it is set to TRUE only the vertices with value FALSE are kept.
- Returns
the filtered state of the graph
-
set_filter_view
(vmask)¶ DGraph._graph_view is a filtered view of the DGraph._graph, in that case the state of the DGraph is not affected by the filtering operation, i.e. after filtering DGraph._graph has the same vertices and edges as before filtering :param vmask: boolean mask for the vertices of the graph :return: filtered state of the graph view
-
unset_filter
()¶ Reset the filtering of the DGraph._graph instance :return: the filtered state
-
unset_filter_view
()¶
-
property
value
¶
-
property
vertex_properties
¶
-
property
vertices
¶
-
property
vertices_view
¶
-
property
vid
¶
-
property
vids
¶
-
property
vids_view
¶
-
property
vmask
¶
-
-
hypermorph.data_graph.
int_to_class
(class_id)¶ - Parameters
class_id – (0 - ‘HyperAtom’) or (1 - ‘HyperBond’)
- Returns
a class that is used in get(), get_node_by_id() methods
hypermorph.data_graph_hyperatom module¶
-
class
hypermorph.data_graph_hyperatom.
HyperAtom
(gdata, vid=None, **node_properties)¶
hypermorph.data_graph_hyperbond module¶
-
class
hypermorph.data_graph_hyperbond.
HyperBond
(gdata, vid=None, **node_properties)¶
hypermorph.data_graph_link module¶
-
class
hypermorph.data_graph_link.
GDataLink
(gdata, from_node, to_node)¶ Bases:
object
Each instance of GDataLink links a tail node with a head node, i.e. HyperBond —> HyperAtom
Each GDataLink has two connectors (bidirectional edges): An outgoing edge from the tail An incoming edge to the head
In the case of a HyperBond (HB) node there are <Many> Outgoing Edges that start < From One > HB In the case of a HyperAtom (HA) node there are <Many> Incoming Edges that end < To One > HA
GDataLink type represents a DIRECTED MANY TO MANY RELATIONSHIP
Important Notice: Do not confuse the DIRECTION OF RELATIONSHIP with the DIRECTION OF TRAVERSING THE BIDIRECTIONAL EDGES of the GDataLink
Many-to-Many Relationship is defined as a (Many-to-One) and (One-to-Many)
MANY side (tail node) —ONE side (outgoing edge)— —ONE side (incoming edge)— MANY side (head node)
- (fromID)
An outgoing edge
- FROM Node ========================== GDataLink ==========================> TO Node
(toID) An incoming edge
-
property
edge
¶
-
property
gdata
¶
hypermorph.data_graph_node module¶
-
class
hypermorph.data_graph_node.
GDataNode
(gdata, vid=None, **vprops)¶ Bases:
object
- The GDataNode class:
- if vid is None
create a NEW node, i.e. a new vertex on the graph with properties
- if vid is not None
initialize a node that is represented with an existing vertex with vid
-
property
all
¶
-
property
all_edges_ids
¶
-
property
all_links
¶
-
property
all_nids
¶
-
property
all_nodes
¶
-
property
all_vertices
¶
-
property
gdata
¶
-
get_value
(prop_name)¶ - Parameters
prop_name – Vertex property name (vp_names) or @property function name (calculated_properties) or data type properties (field_meta)
- Returns
the value of property for the specific node
-
property
in_edges_ids
¶
-
property
in_links
¶
-
property
in_nids
¶
-
property
in_nodes
¶
-
property
in_vertices
¶
-
property
key
¶
-
property
out_edges_ids
¶
-
property
out_links
¶
-
property
out_nids
¶
-
property
out_nodes
¶
-
property
out_vertices
¶
-
property
vertex
¶
hypermorph.data_pipe module¶
-
class
hypermorph.data_pipe.
DataPipe
(schema_node, result=None)¶ Bases:
hypermorph.utils.GenerativeBase
Implements method chaining: A query operation, e.g. projection, counting, filtering can invoke multiple method calls. Each method corresponds to a query operator such as: get_components.over().to_dataframe().out()
out() method is always at the end of the chained generative methods to return the final result
Each one of these operators returns an intermediate result self.fetch allowing the calls to be chained together in a single statement.
DataPipe methods such as get_rows() are wrapped inside methods of other classes e.g. get_rows() Table(SchemaNode) so that when they are called from these methods the result can be chained to other methods of DataPipe In that way we implement easily and intuitively transformations and conversion to multiple output formats
- Notice: we distinguish between two different execution types according to the evaluation of the result
Lazy evaluation, see for example to_***() methods
Eager evaluation
- This module has a dual combined purpose:
perform transformations from one data structure to another data structure
load data into volatile memory (RAM, DRAM, SDRAM, SRAM, GDDR) or import data into non-volatile storage (NVRAM, SSD, HDD, Database) with a specific format e.g. parquet, JSON, ClickHouse MergeTree engine table, MYSQL table, etc…
Transformation, importing and loading operations are based on pyarrow/numpy library and ClickHouse columnar DBMS
-
exclude
(select=None)¶ - Parameters
select – Exclude columns in projection
- Returns
-
get_columns
()¶ Wrapped in Table(SchemaNode) class :return: pass self.fetch to the next chainable operation
-
get_rows
(npartitions=None, partition_size=None)¶ Wrapped in Table(SchemaNode) class Fetch either records of an SQL table or rows of a flat file Notice: Specify either npartitions or block_size parameter or none of them
- Parameters
npartitions – split the values of the index column linearly slice() will have the effect of modifying accordingly the split
partition_size – number of records to use for each partition or target size of each partition, in bytes
Notice: npartitions or partition_size will perform a lazy evaluation and it will return a generator object
- Returns
pass self.fetch to the next chainable operation
-
order_by
(columns)¶ - Parameters
columns – comma separated string column names to sort by
- Returns
-
out
(lazy=False)¶ We distinguish between two cases, eager vs lazy evaluation. This is particularly useful when we deal with very large dataframes that do not fit in memory
- Parameters
:lazy –
- Returns
use out() method at the end of the chained generative methods to return the
output of SchemaNode objects displayed with the appropriate specified format and structure
-
over
(select=None, as_names=None, as_types=None)¶ - Notice: over() must be present in method chaining
when you fetch data by constructing and executing an SQL query in that case default projection self._project = ‘ * ‘
- Parameters
select – projection over the selected metadata columns
as_names – list of column names to use for resulting frame List of user-specified column names, these are used: i) to rename columns (SQL as operator) ii) to extend the result set with calculated columns from an expression
as_types – list of data types or comma separated string of data types e.g. when we read data from flat files using pandas.read_csv and we want to disable type inference on those columns these are pandas data types
- Returns
pass self.fetch to the next chainable operation
-
property
schema_node
¶
-
slice
(limit=None, offset=0)¶ - Parameters
limit – number of rows to return from the result set
offset – number of rows to skip from the result set
- Returns
SQL statement
-
property
sql_query
¶
-
to_batch
(delimiter=None, nulls=None, skip=0, trace=None, arrow_encoding=True)¶ - Parameters
delimiter – 1-character string specifying the boundary between fields of the record
nulls – list of strings that denote nulls e.g. [‘N’]
skip – number of rows to skip at the start of the flat file
trace – trace execution of query, i.e. print query, ellapsed time, rows in set, etc….
arrow_encoding – apply PyArrow columnar dictionary encoding
- Returns
PyArrow RecordBatch with optionally dictionary encoded columns
-
to_dataframe
(data=None, index=None, delimiter=None, nulls=None, trace=None)¶ - Parameters
data – ndarray (structured or homogeneous), Iterable, dict
index – column names of the result set to use in pandas dataframe index
delimiter – 1-character string specifying the boundary between fields of the record
nulls – list of strings that denote nulls e.g. [‘N’]
trace – trace execution of query, i.e. print query, ellapsed time, rows in set, etc….
- Returns
pandas dataframe
-
to_feather
(path, **feather_kwargs)¶ - Parameters
path – full path of the feather file
feather_kwargs –
- Returns
file_location
-
to_parquet
(path, **parquet_kwargs)¶ - Parameters
path – full path string of the parquet file
parquet_kwargs – row_group_size, version, use_dictionary, compression (see…
https://pyarrow.readthedocs.io/en/latest/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table :return: file_location
-
to_table
(delimiter=None, nulls=None, skip=0, trace=None, arrow_encoding=True)¶ - Notice1: This is a transformation from a row layout to a column layout, i.e. chained to get_rows() method
Dictionary encoded columnar layout is a fundamental component of HyperMorph associative engine.
Notice2: The output is a PyArrow Table data structure with a columnar layout, NOT a row layout, Notice3: method is also used when we fetch columns directly from a columnar data storage e.g.
ClickHouse columnar database, parquet files, i.e. chained to get_columns() method
- Parameters
delimiter – 1-character string specifying the boundary between fields of the record
nulls – list of strings that denote nulls e.g. [‘N’]
skip – number of rows to skip at the start of the flat file
trace – trace execution of query, i.e. print query, ellapsed time, rows in set, etc….
arrow_encoding – apply PyArrow columnar dictionary encoding
- Returns
PyArrow in-memory table with a columnar data structure with optionally dictionary encoded columns
-
to_tuples
(trace=None)¶ ToDo NumPy structured arrays representation…. :param trace: trace execution of query, i.e. print query, ellapsed time, rows in set, etc…. :return:
-
where
(condition=None)¶
hypermorph.draw_hypergraph module¶
hypermorph.exceptions module¶
-
exception
hypermorph.exceptions.
ASETError
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when it fails to construct an AssociativeSet instance
-
exception
hypermorph.exceptions.
AssociationError
¶
-
exception
hypermorph.exceptions.
ClickHouseException
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when it fails to execute query in ClickHouse
-
exception
hypermorph.exceptions.
DBConnectionFailed
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when it fails to create a connection with the database
-
exception
hypermorph.exceptions.
GraphError
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised in Schema methods
-
exception
hypermorph.exceptions.
GraphLinkError
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised in SchemaLink methods
-
exception
hypermorph.exceptions.
GraphNodeError
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised in SchemaNode methods or in any of the methods of SchemaNode subclasses
-
exception
hypermorph.exceptions.
HACOLError
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when it fails to initialize HACOL
-
exception
hypermorph.exceptions.
HyperMorphError
¶ Bases:
Exception
Base class for all HyperMorph-related errors
-
exception
hypermorph.exceptions.
InvalidAddOperation
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when you call DataManagementFramework.add() with invalid parameters
-
exception
hypermorph.exceptions.
InvalidDelOperation
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when you call DataManagementFramework.del() with invalid parameters
-
exception
hypermorph.exceptions.
InvalidEngine
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when we pass a wrong type of HyperMorph engine
-
exception
hypermorph.exceptions.
InvalidGetOperation
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when you call DataManagementFramework.get() with invalid parameters
-
exception
hypermorph.exceptions.
InvalidPipeOperation
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when it fails to execute an operation in a pipeline
-
exception
hypermorph.exceptions.
InvalidSQLOperation
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when it fails to execute an SQL command
-
exception
hypermorph.exceptions.
InvalidSourceType
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when we pass a wrong source type of HyperMorph
-
exception
hypermorph.exceptions.
MISError
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised in operations with DataDictionary
-
exception
hypermorph.exceptions.
PandasError
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when it fails to construct pandas dataframe
-
exception
hypermorph.exceptions.
UnknownDictionaryType
¶ Bases:
hypermorph.exceptions.HyperMorphError
Raised when trying to add a term in the dictionary with an unknown type Types can be either : HyperEdges, i.e. instances of the TBoxTail class DRS, DMS, DLS - (dim4, 0 , 0) HLT, DS, DM - (dim4, dim3, 0) HyperNodes, i.e. instances of the TBoxHead class TSV, CSV, FLD - (dim4, dim3, dim2) ENT, ATTR - (dim4, dim3, dim2)
-
exception
hypermorph.exceptions.
UnknownPrimitiveDataType
¶ Bases:
hypermorph.exceptions.HyperMorphError
Primitive Data Types are: [‘bln’, ‘int’, ‘flt’, ‘date’, ‘time’, ‘dt’, ‘enm’, ‘uid’, ‘txt’, ‘wrd’]
-
exception
hypermorph.exceptions.
WrongDictionaryType
¶ Bases:
hypermorph.exceptions.HyperMorphError
raised when we attempt to call a specific method on an object that has wrong node type
hypermorph.hacol module¶
-
class
hypermorph.hacol.
HAtomCollection
(attribute, data)¶ Bases:
object
A HyperAtom Collection (HACOL) can be: 1. A set of hyperatoms (HACOL_SET) that represent the domain of values for a specific attribute
2. A multiset of hyperatoms (HACOL_BAG) that represents a column of data in a table Each hyperatom may appear multiple times in this collection because each hyperatom is linked to one or more hyperbonds (MANY-TO-MANY relationship)
3. A set of values of a specific data type (HACOL_VAL) where each value is associated with a hyperatom from the set of hyperatoms (HACOL_SET) to form a KV pair.
The set of KV pairs represents the domain of a specific attribute where: K is the key of hyperatom with dimensions (dim3-model, dim2-attribute, dim1-distinct value) V is the data type value
HyperAtoms can be displayed with K, V or K:V pair
All hyperatoms in (1), (2) and (3) have common dimensions (dim3, dim2) i.e. same model, same attribute
- HACOL is bringing together but at the same time keep them separate under the same object:
metadata stored in an Attribute of the DataModel data (self._data) stored in PyArrow DictionaryEncoded Array object Notice: data points to a DictionaryEncoded Array object which is a column of a PyArrow Table
-
count
(dataframe=True)¶
-
property
data
¶
-
dictionary
(columns=None, index=None, order_by=None, ascending=None, limit=None, offset=0)¶ - Parameters
columns – list (or comma separated string) of column names for pandas dataframe
index – list (or comma separated string) of column names to include in pandas dataframe index
order_by – str or list of str Name or list of names to sort by
ascending – bool or list of bool, default True the sorting order
limit – number of records to return from states dictionary
offset – number of records to skip from states dictionary
- Returns
states dictionary of HACOL
-
property
filtered
¶
-
property
filtered_data
¶
-
property
hatoms_included
¶
-
is_filtered
()¶ - Returns
The filtered state of the HACOL
-
memory_usage
(mb=True, dataframe=True)¶
-
property
pipe
¶ Returns a HACOLPipe GenerativeBase object that refers to an instance of a HyperCollection use this object to chain operations and to update the state of HyperCollection instance.
-
print_states
(limit=10)¶ wrapper for dictionary() :param limit: :return:
-
property
q
¶ wrapper for the starting point of a query pipeline :return:
-
reset
()¶
-
update_frequency_include_color_state
(indices)¶ In associative filtering we update frequency, include and color state for ALL HACOLs
- Parameters
indices – unique indices of filtered values (pyarrow.lib.Int32Array) these are values that are included in a column of a filtered table
- Returns
-
update_select_state
(indices)¶ - Parameters
indices – unique indices of the selected values (pyarrow.lib.Int32Array)
- Returns
-
property
values_included
¶
hypermorph.hacol_pipe module¶
-
class
hypermorph.hacol_pipe.
HACOLPipe
(hacol, result=None)¶ Bases:
hypermorph.utils.GenerativeBase
-
And
()¶ ToDo: …. :return:
-
In
()¶ ToDo:….. 1st case comma separated string or list of string values e.g. ‘Fairfax Village, Anacostia Metro, Thomas Circle, 15th & Crystal Dr’
(‘Fairfax Village’, ‘Anacostia Metro’, ‘Thomas Circle’, ‘15th & Crystal Dr’)
2nd case list of numeric values e.g. (31706, 31801, 31241, 31003)
-
Not
()¶ ToDo: …. :return:
-
Or
()¶ ToDo: …. :return:
-
between
(low, high, low_open=False, high_open=False)¶ ToDo:… scalar operations with an interval :param low: lower limit point :param high: upper limit point :param low_open: :param high_open:
closed interval (default) —> low_open=False, high_open=False open interval —> low_open=True, high_open=True half open interval —> low_open=False, high_open=True half open interval —> low_open=True, high_open=False
- Returns
BooleanArray Mask that is used in filter()
-
count
(dataframe=True)¶ - Parameters
dataframe – flag to display output with a Pandas dataframe
- Returns
number of values in filtered/unfiltered state number of hatoms in filtered/unfiltered state
-
filter
(mask=None)¶ It uses a boolean array mask (self.fetch) constructed in previous chained operation to filter HACOL data represented with a DictionaryArray
- Parameters
mask – this is used when we call filter() externally from ASETPipe.filter() method to update the filtering state of HACOL
- Returns
DictionaryArray, i.e. HACOL.data filtered the filtered DictionaryArray is pointed at self._hacol.filtered_data
-
like
(pattern)¶ Notice: like operator can also be used in where() as a string :param str pattern: match substring in column string values :return: PyArrow Boolean Array mask (self.fetch) that is used in filter()
it also returns boolean mask to calls from ASETPipe.where(), ASETPipe.And() methods
-
out
(lazy=False)¶ We distinguish between two cases, eager vs lazy evaluation. This is particularly useful when we deal with very large HyperAtom collections that do not fit in memory
- Parameters
:lazy –
- Returns
use out() method at the end of the chained generative methods to return the output displayed with the appropriate specified format and structure
-
slice
(limit=None, offset=0)¶ slice is used either to limit the number of entries to return in the states dictionary or to limit the members of HyperAtom collection, i.e. hyperatoms (values)
- Parameters
limit – number of records to return from the result set
offset – number of records to skip from the result set
- Returns
A slice of records
-
start
()¶ This is used as the first method in a chain of other methods where we set the filtered/unfiltered data pipeline methods slice(), to_array(), to_numpy(), to_series() start here :return: DictionaryArray either in filtered or unfiltered state
-
to_array
(order=None, unique=False)¶ - Parameters
order – default None, ‘asc’, ‘desc’
unique – take distinct elements in array
- Returns
by default PyArrow Array or PyArrow DictionaryArray if dictionary=False
-
to_hyperlinks
(hb2=10001)¶ - Parameters
hb2 – dim2 value for hyperbonds, it is set at a high enough value >10000 to filter them later on in the graph of data
- Returns
HyperLinks (edges that connect a HyperBond with HyperAtoms) List of pairs in the form [ ((hb2, hb1), (ha2, ha1)), ((hb2, hb1), (ha2, ha1)), …] These are used to create a data graph
-
to_numpy
(order=None, limit=None, offset=0)¶ - Parameters
order – default None, ‘asc’, ‘desc’
limit – number of values to return from HACOL
offset – number of values to skip from HACOL
- Returns
-
to_series
(order=None, limit=None, offset=0)¶ - Parameters
order – default None, ‘asc’, ‘desc’
limit – number of values to return from HACOL
offset – number of values to skip from HACOL
- Returns
Pandas Series
-
to_string_array
()¶ - Returns
List of string values This is a string representation for the valid (non-null) values of the filtered HACOL It is used in the construction of a data graph to set the value property of the node
-
where
(condition='$v')¶ Example: phys.q.where(‘city like ATLANTA’) Notice: Entering where() method, self.fetch = self._hacol.filtered_data
Thus pc.match_substring(), pc.greater(), pc.equal() etc… are applied to either already filtered or unfiltered (self._hacol.filtered_data = self._hacol.data) DictionaryArray
- Parameters
condition –
- Returns
PyArrow Boolean Array mask (self.fetch) that is used in filter() it also returns boolean mask to calls from ASETPipe.where(), ASETPipe.And() methods
-
hypermorph.haset module¶
-
class
hypermorph.haset.
ASET
(entity, debug)¶ Bases:
object
An AssociativeSet, also called AssociativeEntitySet, is ALWAYS bounded to a SINGLE entity An AssociativeSet is a Set of Association objects (see Association Class) An AssociativeSet can also be represented with a set of HyperBonds
There is a direct analogy with the Relational model:
Relation : A set of tuples —-> Associative Set : A set of Associations Body : tuples of ordered values —-> Body : Associations Heading : A tuple of ordered attribute names —-> Heading : A set of attributes View : Derived relation —-> Associative View: A derived set of Associations
- ASET is bringing together but at the same time keep them separate under the same object:
metadata stored in an Entity of the DataModel data (self._data) stored in PyArrow DictionaryEncoded Table object from one or more DataSet(s)
-
property
attributes
¶
-
count
()¶ wrapper for ASETPipe.count() method :return:
-
property
data
¶
-
dictionary_encode
(delimiter=None, nulls=None, skip=0, trace=None)¶ It will load data from the DataSet, it currently supports tabular format (rows or columns of a data table) and will apply PyArrow DictionaryArray encoding to the columns
- Parameters
delimiter – 1-character string specifying the boundary between fields of the record
nulls – list of strings that denote nulls e.g. [‘N’]
skip – number of rows to skip at the start of the flat file
trace – trace execution of query, i.e. print query, ellapsed time, rows in set, etc….
- Returns
PyArrow RecordBatch constructed with DictionaryEncoded Array objects
-
property
entity
¶
-
property
filtered
¶
-
property
filtered_data
¶
-
property
hacols
¶
-
property
hbonds
¶
-
is_filtered
()¶ - Returns
The filtered state of ASET
-
property
mask
¶
-
memory_usage
(mb=True, dataframe=True)¶ - Parameters
mb – output units MegaBytes
dataframe – flag to display output with a Pandas dataframe
- Returns
-
property
num_rows
¶
-
property
pipe
¶ Returns an ASETPipe GenerativeBase object that refers to an instance of a HyperCollection use this object to chain operations and to update the state of HyperCollection instance.
-
print_rows
(select=cname_list, order_by='city, last, first', limit=20, index='npi, pacID')¶ - Parameters
select –
as_names –
index –
order_by –
ascending –
limit –
offset –
- Returns
-
property
q
¶ wrapper for the starting point of a query pipeline :return:
-
reset
(hacols_only=False)¶ - ASET reset includes:
Construction of PyArrow Boolean Array mask with ALL True reset of filtered state to False reset of Hyperbonds reset of HACOLs
- Parameters
hacols_only – Flag for partial reset of HACOLs only
- Returns
-
property
select
¶ wrapper for the starting point of a query pipeline in associative filtering mode :return:
-
update_hacols_filtered_state
()¶ Update the filtering state of HyperAtom collections This is used when we want to operate with HyperAtom collections at filtered state <aset>.<hacol>.<operation>
For a single HACOL we can also use the form <aset>.<hacol>.q.filter(<aset.mask>).<operation>.out() :return:
hypermorph.haset_pipe module¶
-
class
hypermorph.haset_pipe.
ASETPipe
(aset, result=None)¶ Bases:
hypermorph.utils.GenerativeBase
-
And
(condition)¶ - Parameters
condition –
- Returns
BooleanArray Mask that is used in filter()
-
count
()¶ - Returns
number of hbonds (rows) in filtered/unfiltered state
-
filter
()¶ - Returns
-
out
(lazy=False)¶ We distinguish between two cases, eager vs lazy evaluation. This is particularly useful when we deal with very large dataframes that do not fit in memory
- Parameters
:lazy –
- Returns
use out() method at the end of the chained generative methods to return the
output of SchemaNode objects displayed with the appropriate specified format and structure
-
over
(select=None, as_names=None, as_types=None)¶ Notice: over(), i.e. projection is chained after the filter() method
- Parameters
select – projection over the selected metadata columns
as_names – list of column names to use for resulting dataframe List of user-specified column names, these are used: i) to rename columns (SQL as operator) ii) to extend the result set with calculated columns from an expression
as_types – list of data types or comma separated string of data types
- Returns
RecordBatch
-
select
()¶ - Warning: DO NOT CONFUSE select() with over() operator
In HyperMorph select() is used as a flag to alter the state of HyperAtom collections This is the associative filtering that takes place where we
Change the filtering state of HyperAtom collections
Update the selection, included states for each member of the HyperAtom collection
From an end-user perspective that results in selecting values from a HyperAtom collection
- Notice: In associative filtering mode we use only where() restriction
and we filter with values from a SINGLE HyperAtom collection
- Returns
-
slice
(limit=None, offset=0)¶ - Parameters
limit – number of records to return from the result set
offset – number of records to skip from the result set
- Returns
A slice of records
-
start
()¶ This is used as the first method in a chain of other methods where we set the filtered/unfiltered data pipeline methods over(), slice(), to_record_batch(), to_records(), to_table(), to_dataframe() start here :return: RecordBatch either in filtered or unfiltered state
-
to_dataframe
(index=None, order_by=None, ascending=None, limit=None, offset=0)¶ - Notice1: Use to_record_batch() transformation before chaining it to Pandas DataFrame,
it is a lot faster this way because it decodes PyArrow RecordBatch, i.e. RecordBatch columns are not dictionary encoded
- Notice2: sorting (order_by, ascending) and slicing (limit, offset) in a Pandas dataframe is slow
but sorting has not been implemented in PyArrow and that is why we pass these parameters here
- Parameters
order_by – str or list of str Name or list of names to sort by
ascending – bool or list of bool, default True the sorting order
limit – number of records to return from the result set
offset – number of records to skip from the result set
index – list (or comma separated string) of column names to include in pandas dataframe index
- Returns
Pandas dataframe
-
to_hyperlinks
()¶ - Returns
HyperLinks (edges that connect a HyperBond with HyperAtoms) List of pairs in the form [ ((hb2, hb1), (ha2, ha1)), ((hb2, hb1), (ha2, ha1)), …] These are used to create a data graph
- Notice: Set HACOLs to filtered state first,
using self._aset.update_hacols_filtered_state()
-
to_record_batch
()¶ - Returns
PyArrow RecordBatch but columns are not dictionary encoded
Notice: Always decode PyArrow RecordBatch before sending it to Pandas DataFrame, it is a lot faster
-
to_records
()¶ - Returns
NumPy Records
-
to_string_array
(unique=False)¶ - Parameters
unique –
- Returns
List of string values This is a string representation for the valid (non-null) values of the filtered HACOL It is used in the construction of a data graph to set the value property of the node
- Notice: Set HACOLs to filtered state first,
using self._aset.update_hacols_filtered_state()
-
to_table
()¶ - Returns
PyArrow Table
-
where
(condition)¶ Notice: The minimum condition you specify is the attribute name or attribute dim2 dimension Valid conditions: ‘$2’, ‘quantity’, ‘price>=4’, ‘size = 10’
- Parameters
condition –
- Returns
BooleanArray Mask that is used in filter()
-
hypermorph.hassoc module¶
-
class
hypermorph.hassoc.
Association
(*pos_args, **kw_args)¶ Bases:
object
This is the analogue of a relational tuple, i.e. row of ordered values An Association is the basic construct of Associative Sets
It is called Association because it associates a HyperBond to a set of HyperAtoms HyperBond is a symbolic 2D numerical representation of a row and HyperAtom is a symbolic 2D numerical representation of a unique value in the table column HyperAtoms can also have textual (string) representation
Association can be represented in many ways: i) With the hb key A[7, 4]
ii) With keyword arguments Association(hb=(7, 4), prtcol=None, prtwgt=None, prtID=227, prtnam=’car battery’, prtunt=None)
iii) With positional arguments Association((7,4), None, None, 227, ‘car battery’, None)
heading: a set of attributes and a key e.g. (‘hb’, ‘prtcol’, ‘prtwgt’, ‘prtID’, ‘prtnam’, ‘prtunt’)
body: KV pairs e.g. Association(hb=(7, 4), prtcol=None, prtwgt=None, prtID=227, prtnam=’car battery’, prtunt=None)
-
property
body
¶
-
static
change_heading
(*fields)¶
-
get
()¶
-
property
heading_fields
¶
-
property
hypermorph.mis module¶
-
class
hypermorph.mis.
MIS
(debug=0, rebuild=False, warning=True, load=False, **kwargs)¶ Bases:
object
MIS is a builder pattern class based on Schema class, ….
-
add
(what, **kwargs)¶ Add new nodes to HyperMorph Schema or an Associative Entity Set :param what: the type of node to add (datamodel, entity, entities, attribute, dataset) :param kwargs: pass keyword arguments to Schema.add() method :return: the object(s) that were added to HyperMorph Schema
-
static
add_aset
(from_table=None, with_fields=None, entity=None, entity_name=None, entity_alias=None, entity_description=None, datamodel=None, datamodel_name='NEW Data Model', datamodel_alias='NEW_DM', datamodel_descr=None, attributes=None, as_names=None, as_types=None, debug=0)¶ - There are three ways to create an ASET object:
From an Entity that has already a mapping defined (entity) fields are mapped onto the attributes of an existing Entity
From a Table of a dataset (from_table, with_fields) that are mapped onto the attributes of a NEW Entity that is created in an existing DataModel,
From a Table of a dataset (from_table, with fields) that are mapped onto the attributes of a NEW Entity that is created in a NEW DataModel
Case (2) and (3) define a new mapping between a data set and a data model
- Parameters
from_table –
with_fields –
entity –
entity_name –
entity_alias –
entity_description –
datamodel –
datamodel_name –
datamodel_alias –
datamodel_descr –
attributes –
as_names –
as_types –
debug –
- Returns
-
property
all_nodes
¶
-
at
(*args)¶
-
property
datamodels
¶
-
property
datasets
¶
-
property
dms
¶
-
property
drs
¶
-
get
(nid, what='node', select=None, index=None, out='dataframe', junction=None, mapped=None, key_column='nid', value_columns='cname', filter_attribute=None, filter_value=None, reset=False)¶ This method implements the functional paradigm, it is basically a wrapper of chainable methods, for example: get(461). get_entities(). over(select=’nid, dim3, dim2, cname, alias, descr’). to_dataframe(index=’dim3, dim2’). out()
can be written as get(461, what=’entities’, select=’nid, dim3, dim2, cname, alias, descr’, out=’dataframe’, index=’dim3, dim2’)
- Parameters
nid –
what –
select –
index –
out –
junction –
mapped –
key_column –
value_columns –
filter_attribute –
filter_value –
reset –
- Returns
-
get_all_nodes
()¶
-
get_datamodels
()¶
-
get_datasets
()¶
-
get_overview
()¶
-
get_systems
()¶
-
property
hls
¶
-
load
(**kwargs)¶
-
property
mem
¶
-
property
mms
¶
-
property
overview
¶
-
rebuild
(warning=True, **kwargs)¶
-
property
root
¶
-
save
()¶
-
static
size_of_dataframe
(df, deep=False)¶
-
static
size_of_object
(obj)¶
-
property
sls
¶
-
property
systems
¶
-
hypermorph.schema module¶
-
class
hypermorph.schema.
Schema
(rebuild=False, load=False, **graph_properties)¶ Bases:
object
Schema class creates a data catalog, i.e. meta-data repository. Data catalog resembles (TBox) a vocabulary of “terminological components”, i.e. abstract terms Data catalog properties e.g. dimensions, names, counters, etc describe the concepts in a data dictionary These terms are Entity types, Attribute types, Data Resource types, Link(edge) types, etc…. TBox is about types and relationships between types e.g. Entity-Attribute, Table-Column, Object-Fields, etc….
Schema of HyperMorph is represented with a directed graph that is based on graph_tool python module. Schema graph is composed from SchemaNodes and SchemaEdges. Each SchemaEdge links two SchemaNodes and we define a direction convention from a tail SchemaNode to a head SchemaNode.
System, DataModel, DataSet, GraphDataModel, Table, Field, classes are derived from SchemaNode class
Schema of HyperMorph is a hypergraph defined by two sets of objects (a.k.a. hyper-nodes & hyper-edges). If we have ‘hyper-edges’ HE={he1, he2, he3} and ‘hyper-nodes’ B={hn1, hn2, hn3} then we can make a map such as d = {he1: (hn1, hn2), he2: (hn2), he3: (hn1, hn2, hn3)} G(HE, HN, d) is the hypergraph
-
add
(what, with_components=False, datamodel=None, **kwargs)¶ Wrapper method for add methods
- Parameters
what – the type of node to add (datamodel, entity, entities, attribute, dataset)
with_components –
- existing components of the dataset to add, valid parameters are
[‘tables’, ‘fields’], ‘tables’, ‘graph data models’, ‘schemata’)
- ”tables”: For datasets in a DBMS add database tables,
For datasets from files with a tabular structure add files of a specific type in a folder Files with tabular structure are flat files (CSV, TSV), Parquet files, Excel files, etc… Note: These are added as new Table nodes of HyperMorph Schema with type TBL
- ”fields”: Either add columns of a database table or fields of a file with tabular structure
Note: These are added as new Field nodes of HyperMorph Schema with type FLD
- ”graph data models”: A dataset of graph data models, i.e. files of type .graphml or .gt in a folder
Each files in the set serializes, represents, HyperMorph DataModel
- ”schemata”: A dataset of HyperMorph schemata, i.e. files of type .graphml or .gt in a folder
Each file in the set serializes, represents, HyperMorph Schema
datamodel – A node of type DM to add NEW nodes of type Entity and Attribute
kwargs – Other keyword arguments to pass
- Returns
the object(s) that were added to HyperMorph Schema
-
add_datamodel
(**nprops)¶ - Parameters
nprops – schema node (vertex) properties
- Returns
DataModel object
-
add_dataset
(**nprops)¶ - Parameters
nprops – schema node (vertex) properties
- Returns
DataSet object
-
add_edge
(from_vertex, to_vertex, **eprops)¶ - Parameters
from_vertex – tail vertex
to_vertex – head vertex
eprops – Schema edge properties
- Returns
an edge of Schema Graph
-
add_edges
(elist)¶ Notice: it is not used in this module….
- Parameters
elist – edge list
- Returns
-
add_link
(from_node, to_node, **eprops)¶ - Parameters
from_node – tail node is a SchemaNode object or node ID
to_node – head node is a SchemaNode object or node ID
eprops – edge properties
If there isn’t a link from node, to node it will try to create a new one, otherwise it will return an existing SchemaLink instance
- Returns
SchemaLink object, i.e. an edge of the schema graph
-
add_vertex
(**vprops)¶ - Parameters
vprops – Schema vertex properties
- Returns
a vertex of Schema Graph
-
property
alias
¶
-
property
all_nodes
¶ - Returns
shortcut for SchemaPipe operation to set the GraphView in unfiltered state and get all the nodes
-
at
(dim4, dim3, dim2)¶ Notice: Only data model, data resource objects have keys with dimensions (dim4, dim3, dim2)
- Parameters
dim4 – dim4 is taken from self.dms.dim4 or self.drs.dim4 it is fixed and never changes
dim3 – represents a datamodel or dataset object
dim2 – represents a component of datamodel or dataset object
- Returns
the dataset or the datamodel object with the specific key
-
property
cname
¶
-
property
counter
¶
-
property
ctype
¶
-
property
datamodels
¶ - Returns
shortcut for SchemaPipe operations to output datamodels metadata in a dataframe
-
property
datasets
¶ - Returns
shortcut for SchemaPipe operations to output datasets metadata in a dataframe
-
property
descr
¶
-
property
dim2
¶
-
property
dim3
¶
-
property
dim4
¶
-
property
dms
¶
-
property
drs
¶
-
property
ealias
¶
-
property
edge_properties
¶
-
property
elabel
¶
-
property
ename
¶
-
property
etype
¶
-
property
extra
¶
-
get
(nid)¶ - Parameters
nid – Node ID (vertex id)
- Returns
SchemaNode object
-
get_all_nodes
()¶ - Returns
result from get_all_nodes method that can be chained to other operations e.g. filter_view(),
-
get_datamodels
()¶ - Returns
result from get_datamodels method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_datasets
()¶ - Returns
result from get_datasets method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_ep
(ep_name)¶ - Parameters
ep_name – edge property name
- Returns
EdgePropertyMap object
-
get_ep_value
(ep_name, edge)¶ - Parameters
ep_name – edge property name
edge –
- Returns
the enumerated value of edge property on the specific edge of the graph the value is enumerated with a key in the eprop_dict
-
get_ep_values
(ep_name)¶
-
get_node_by_id
(nid)¶ - Parameters
nid – node ID (vertex id)
- Returns
SchemaNode object
-
get_node_by_key
(dim4, dim3, dim2)¶ Notice: Only data model, data resource objects have keys with dimensions (dim4, dim3, dim2)
- Parameters
dim4 – dim4 is taken from self.dms.dim4 or self.drs.dim4 it is fixed and never changes
dim3 – represents a datamodel or dataset object
dim2 – represents a component of datamodel or dataset object
- Returns
the dataset or the datamodel object with the specific key
-
get_overview
()¶ - Returns
result from get_datamodels method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_systems
()¶ - Returns
result from get_systems method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_vp
(vp_name)¶ - Parameters
vp_name – vertex property name
- Returns
VertexPropertyMap object
-
get_vp_value
(vp_name, vid)¶ - Parameters
vp_name – vertex property name
vid – either vertex object or vertex index (node id)
- Returns
the value of vertex property on the specific vertex of the graph
-
get_vp_values
(vp_name, filtered=False)¶
-
property
graph
¶
-
property
graph_properties
¶
-
property
graph_view
¶
-
property
hls
¶
-
property
is_filtered
¶
-
property
is_view_filtered
¶
-
property
list_properties
¶
-
property
net_alias
¶
-
property
net_descr
¶
-
property
net_edges
¶
-
property
net_format
¶
-
property
net_name
¶
-
property
net_path
¶
-
property
net_tool
¶
-
property
net_type
¶
-
property
ntype
¶
-
property
overview
¶ - Returns
shortcut for SchemaPipe operations to output an overview of systems, datamodels, datasets in a dataframe
-
property
root
¶
-
save_graph
()¶ Save HyperMorph Schema._graph using the self._net_name, self._net_path and self._net_format
-
set_filter
(filter_value, filter_attribute=None, operator='eq', reset=True, inverted=False)¶ This is filtering the Schema Graph instance :param filter_value: the value of the attribute to filter vertices of the graph
or a list of node ids (vertex ids)
- Parameters
filter_attribute – is a defined vertex property for filtering vertices of the graph (Schema nodes) to create a GraphView
operator – e.g. comparison operator for the values of node
reset – set the GraphView in unfiltered state, i.e. parameter vfilt=None set the vertex mask in unfiltered state, i.e. fill array with zeros this step is necessary when we filter with node_ids
inverted –
- Returns
the filtered state
-
set_filter_view
(filter_value, filter_attribute=None, operator='eq', reset=True)¶ GraphView is a filtered view of the Graph, in that case the state of the Graph is not affected by the filtering operation, i.e. after filtering Graph has the same nodes and edges as before filtering
- Parameters
filter_value – the value of the attribute to filter vertices of the graph or a list of node ids (vertex ids)
filter_attribute – is a defined vertex property for filtering vertices of the graph (Schema nodes) to create a GraphView
operator – e.g. comparison operator for the values of node
reset – set the GraphView in unfiltered state, i.e. parameter vfilt=None set the vertex mask in unfiltered state, i.e. fill array with zeros this step is necessary when we filter with node_ids
- Returns
-
property
sls
¶
-
property
systems
¶ - Returns
shortcut for SchemaPipe operations to output systems metadata in a dataframe
-
unset_filter
()¶ Reset the filtering of the Schema Graph, notice that Schema Graph :return: the filtered state
-
unset_filter_view
()¶
-
property
vertex_properties
¶
-
property
vertices
¶
-
property
vertices_view
¶
-
property
vid
¶
-
property
vids
¶
-
property
vids_view
¶
-
property
vmask
¶
-
-
hypermorph.schema.
str_to_class
(class_name)¶ - Parameters
class_name – e.g. Table, Entity, Attributes (see class_dict)
- Returns
a class that is used in get(), get_node_by_id() methods
hypermorph.schema_dms_attribute module¶
-
class
hypermorph.schema_dms_attribute.
Attribute
(schema, vid=None, **node_properties)¶ Bases:
hypermorph.schema_node.SchemaNode
Notice all get_* methods return node ids so that they can be converted easily to many forms keys, dataframe, SchemaNode objects, etc…
-
property
datamodel
¶
-
property
entities
¶ Notice: This has a different output < out(‘node’) >, i.e. not metadata in dataframe, because we use this property in projection. For example in DataSet.get_attributes….. :return: shortcut for SchemaPipe operations to output Entity nodes
-
property
fields
¶
-
property
get_entities
¶ - Returns
result from get_entities method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
property
parent
¶
-
property
hypermorph.schema_dms_datamodel module¶
-
class
hypermorph.schema_dms_datamodel.
DataModel
(schema, vid=None, **node_properties)¶ Bases:
hypermorph.schema_node.SchemaNode
- Notice: all get_* methods return SchemaPipe, DataPipe objects
so that they can be chained to other methods of those classes. That way we can convert, transform easily anything to many forms keys, dataframe, SchemaNode objects…
- ToDo: A method of DataModel to save it separately from Schema,
e.g. write it on disk with a serialized format (graphml) or in a database…. In the current version DataModel can be created with commands and saved in a .graphml, .gt file or it can be saved together with the Schema in a .graphml, .gt file
-
add_attribute
(entalias, **nprops)¶ - Parameters
entalias – Attribute is linked to Entities with the corresponding aliases
nprops – schema node (vertex) properties
- Returns
single Attribute object
-
add_entities
(metadata)¶ - Parameters
metadata – list of dictionaries, dictionary keys are property names of Entity node (cname, alias, …)
- Returns
Entity objects
-
add_entity
(**nprops)¶ - Parameters
nprops – schema node (vertex) properties
- Returns
single Entity object
-
property
attributes
¶ - Returns
shortcut for SchemaPipe operations to output metadata in a dataframe
-
property
components
¶ - Returns
shortcut for SchemaPipe operations to output components metadata of the datamodel in a dataframe
-
property
entities
¶ - Returns
shortcut for SchemaPipe operations to output metadata in a dataframe
-
get_attributes
(junction=None)¶ - Returns
result from get_attributes method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_components
()¶ - Returns
result from get_components method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
property
get_entities
¶ - Returns
result from get_entities method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
property
parent
¶
-
to_hypergraph
()¶
hypermorph.schema_dms_entity module¶
-
class
hypermorph.schema_dms_entity.
Entity
(schema, vid=None, **node_properties)¶ Bases:
hypermorph.schema_node.SchemaNode
Notice: all get_* methods return SchemaPipe, DataPipe objects so that they can be chained to other methods of those classes That way we can convert, transform easily anything to many forms keys, dataframe, SchemaNode objects…
-
property
attributes
¶ - Returns
shortcut for SchemaPipe operations to output metadata in a dataframe
-
property
datamodel
¶
-
get_attributes
(junction=None)¶ - Parameters
junction – True return junction Attributes, False return non-junction Attributes None return all Attributes
- Returns
return result from get_attributes method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_fields
(junction=None)¶ - Parameters
junction – True return fields mapped on junction Attributes, False return fields mapped on non-junction Attributes None return all fields mapped on Attributes
- Returns
Fields (node ids) that are mapped onto Attributes
Notice: In the general case, fields are mapped from more than one DataSet, Table, objects
-
get_tables
()¶ From the fields mapped on non-junction Attributes find its parents, i.e. tables ToDo: Cover the case for fields from multiple tables mapped on attributes of the same entity :return: Table objects
-
has_mapping
()¶ - Returns
True if there are Field(s) of a Table mapped onto Attribute(s) of an Entity, otherwise False
-
property
parent
¶
-
to_hypergraph
()¶
-
property
hypermorph.schema_drs_dataset module¶
-
class
hypermorph.schema_drs_dataset.
DataSet
(schema, vid=None, **node_properties)¶ Bases:
hypermorph.schema_node.SchemaNode
DataSet is a set of data resources (tables, fields, graph datamodels) in the following data containers SQLite database, MySQL database, CSV/TSV flat files and graph data files
- Notice: get_* methods return SchemaPipe, DataPipe objects
so that they can be chained to other methods of those classes. That way we can convert, transform easily anything to many forms keys, dataframe, SchemaNode objects…
-
add_fields
()¶ Structure here is hierarchical a DataSet —has—> Tables each Table —has—> Fields
- Returns
new Field objects
-
add_graph_datamodel
(**nprops)¶ Add graph data model, this is a graph serialization of TRIADB data model
- Parameters
nprops – schema node (vertex) properties
- Returns
single GDM object
-
add_graph_datamodels
()¶ Add graph data models
- Returns
new GDM objects
-
add_graph_schema
(**nprops)¶ Add graph schema, this is a graph serialization of HyperMorph Schema
- Parameters
nprops – schema node (vertex) properties
- Returns
single GSH object
-
add_graph_schemata
()¶ Add graph schemata
- Returns
new GSH objects
-
add_table
(**nprops)¶ - Parameters
nprops – schema node (vertex) properties
- Returns
single Table object
-
add_tables
(metadata=None)¶ - Parameters
metadata – list of dictionaries, keys of dictionary are metadata property names of Table node
- Returns
new Table objects
-
property
components
¶ - Returns
shortcut for SchemaPipe operations to output metadata in a dataframe
-
property
connection
¶
-
property
connection_metadata
¶
-
container_metadata
(**kwargs)¶ - Returns
metadata for the data resource container e.g. metadata for a parquet file, or the tables of a database
-
property
fields
¶ - Returns
shortcut for SchemaPipe operations to output metadata in a dataframe
-
get_components
()¶ - Returns
result from get_components method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_connection
(db_client=None, port=None, trace=0)¶ - Parameters
db_client –
port – use port for either HTTP or native client connection to clickhouse
trace –
- Returns
-
get_fields
(mapped=None)¶ - Parameters
mapped – if True return ONLY those fields that are mapped onto attributes default return all fields
- Returns
result from get_fields method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_graph_datamodels
()¶ - Returns
result from get_graph_datamodels method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_graph_schemata
()¶ - Returns
result from get_graph_schemata method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_tables
()¶ - Returns
result from get_tables method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
property
graph_datamodels
¶ - Returns
shortcut for SchemaPipe operations to output metadata in a dataframe
-
property
graph_schemata
¶ - Returns
shortcut for SchemaPipe operations to output metadata in a dataframe
-
property
parent
¶
-
property
tables
¶ - Returns
shortcut for SchemaPipe operations to output metadata in a dataframe
hypermorph.schema_drs_field module¶
-
class
hypermorph.schema_drs_field.
Field
(schema, vid=None, **node_properties)¶ Bases:
hypermorph.schema_node.SchemaNode
- Notice: all get_* methods return SchemaPipe, DataPipe objects
so that they can be chained to other methods of those classes. That way we can convert, transform easily anything to many forms keys, dataframe, SchemaNode objects…
-
property
attributes
¶
-
property
metadata
¶
-
property
parent
¶
hypermorph.schema_drs_graph_datamodel module¶
-
class
hypermorph.schema_drs_graph_datamodel.
GraphDataModel
(schema, vid=None, **node_properties)¶ Bases:
hypermorph.schema_node.SchemaNode
-
load_into_schema
()¶ Load GraphDataModel data resource into TRIADB Schema in memory
Notice: Do not confuse adding a set of GraphDataModels, i.e. a set of data resources with loading any of these graph data models into TRIADB Schema in memory.
The last one is a different operation, it creates new TRIADB data models into Schema i.e. loads metadata information about the DataModel, its Entities and Attributes into TRIADB Schema
- Returns
DataModel object
-
property
parent
¶
-
hypermorph.schema_drs_graph_schema module¶
-
class
hypermorph.schema_drs_graph_schema.
GraphSchema
(schema, vid=None, **node_properties)¶ Bases:
hypermorph.schema_node.SchemaNode
GraphSchema is a data resource, a child of DataSet like a Table, DO NOT confuse it with HyperMorph Schema An instance of GraphSchema resource is a serialized representation with a file that has <.graphml>, <.gt> format
-
property
parent
¶
-
property
hypermorph.schema_drs_table module¶
-
class
hypermorph.schema_drs_table.
Table
(schema, vid=None, **node_properties)¶ Bases:
hypermorph.schema_node.SchemaNode
- Notice: all get_* methods return SchemaPipe, DataPipe objects
so that they can be chained to other methods of those classes. That way we can convert, transform easily anything to many forms keys, dataframe, SchemaNode objects…
-
add_field
(**nprops)¶ - Parameters
nprops – schema node (vertex) properties
- Returns
single Field object
-
add_fields
(metadata=None)¶ - Parameters
metadata – list of dictionaries, each dictionary contains metadata column properties for a field (column) in a table
- Returns
new Field objects
-
container_metadata
(**kwargs)¶ - Returns
metadata for the data resource container e.g. metadata for columns of MySQL table
-
property
fields
¶ - Returns
shortcut for SchemaPipe operations to output metadata in a dataframe
-
get_columns
()¶ wrapper for DataPipe.get_columns() method :return: return result from get_rows method that can be chained to other operations use out() at the end of the chained methods to retrieve the final result
-
get_fields
(mapped=None)¶ wrapper for SchemaPipe.get_fields() method :param mapped: if True return ONLY those fields that are mapped onto attributes
default return all fields
- Returns
result from get_fields method that can be chained to other operations e.g. over(), out()
use out() at the end of the chained methods to retrieve the final result
-
get_rows
(npartitions=None, partition_size=None)¶ wrapper for DataPipe.get_rows() method :return: result from get_rows method that can be chained to other operations use out() at the end of the chained methods to retrieve the final result
-
property
parent
¶
-
property
sql
¶
-
to_hypergraph
()¶
hypermorph.schema_link module¶
-
class
hypermorph.schema_link.
SchemaLink
(schema, from_node, to_node, **eprops)¶ Bases:
object
Each instance of SchemaLink links a tail node with a head node, examples: (DataModel —> Entity), (Entity —> Attribute), (Field —> Attribute), (Table —> Field), (DataSet —> Table)
Each SchemaLink has two connectors (bidirectional edges): An outgoing edge from the tail An incoming edge to the head
In the case of a HyperEdge (HE) node there are <Many> Outgoing Edges that start < From One > HE In the case of a HyperNode (HN) node there are <Many> Incoming Edges that end < To One > HN
SchemaLink type represents a DIRECTED MANY TO MANY RELATIONSHIP
Important Notice: Do not confuse the DIRECTION OF RELATIONSHIP with the DIRECTION OF TRAVERSING THE BIDIRECTIONAL EDGES of the SchemaLink
Many-to-Many Relationship is defined as a (Many-to-One) and (One-to-Many)
MANY side (tail node) —ONE side (outgoing edge)— —ONE side (incoming edge)— MANY side (head node)
- (fromID)
An outgoing edge
- FROM Node ========================== SchemaLink ==========================> TO Node
(toID) An incoming edge
-
property
all
¶
-
property
edge
¶
-
get_edge_property
(property_name)¶ this is used to access values that are returned from @properties of SchemaLink :param property_name: function name of the @property decorator :return:
-
get_value
(prop_name)¶ - Parameters
prop_name – Edge property name (ep_names)
- Returns
the value of property for the specific link
-
property
schema
¶
hypermorph.schema_node module¶
-
class
hypermorph.schema_node.
SchemaNode
(schema, vid=None, **vprops)¶ Bases:
object
- The SchemaNode class:
- if vid is None
create a NEW node, i.e. a new vertex on the graph with properties
- if vid is not None
initialize a node that is represented with an existing vertex with vid
Notice: All properties and methods defined here are accessible from derived classes Attribute, Entity, DataModel, DataSet, Table, Field
-
property
all
¶
-
property
all_edges_ids
¶
-
property
all_links
¶
-
property
all_nids
¶
-
property
all_nodes
¶
-
property
all_vertices
¶
-
property
descriptive_metadata
¶
-
property
dpipe
¶ Returns a
Pipe
(GenerativeBase object) that refers to an instance of SchemaNode use this object to chain operations defined in DataPipe class
-
get_value
(prop_name)¶ - Parameters
prop_name – Vertex property name (vp_names) or @property function name (calculated_properties) or data type properties (field_meta)
- Returns
the value of property for the specific node
-
property
in_edges_ids
¶
-
property
in_links
¶
-
property
in_nids
¶
-
property
in_nodes
¶
-
property
in_vertices
¶
-
property
key
¶
-
property
out_edges_ids
¶
-
property
out_links
¶
-
property
out_nids
¶
-
property
out_nodes
¶
-
property
out_vertices
¶
-
property
schema
¶
-
property
spipe
¶ Returns a
Pipe
(GenerativeBase object) that refers to an instance of SchemaNode use this object to chain operations defined in SchemaPipe class
-
property
system_metadata
¶
-
property
vertex
¶
hypermorph.schema_pipe module¶
-
class
hypermorph.schema_pipe.
SchemaPipe
(schema_node, result=None)¶ Bases:
hypermorph.utils.GenerativeBase
Implements method chaining: A query operation, e.g. projection, counting, filtering can invoke multiple method calls. Each method corresponds to a query operator such as: get_components.over().to_dataframe().out()
out() method is always at the end of the chained generative methods to return the final result
Each one of these operators returns an intermediate result self.fetch allowing the calls to be chained together in a single statement.
SchemaPipe methods such as get_*(), are wrapped inside methods of derivative classes of Schema, SchemaNode so that when they are called from these methods the result can be chained to other methods of SchemaPipe In that way we implement easily and intuitively transformations and conversion to multiple output formats
- Notice: we distinguish between two different execution types according to the evaluation of the result
Lazy evaluation, see for example to_***() methods
Eager evaluation
-
filter
(value=None, attribute=None, operator='eq', reset=True)¶ - Notice1: to create a filtered Graph from a list/array of nodes
that is a result of previous operations in a pipeline leave attribute=None, value=None to create a Graph from a list/array of nodes that is a result from the execution of other Python commands leave attribute=None and set value=[set of nodes]
- Parameters
attribute – is a defined vertex property (node attribute) for filtering vertices of the graph (Schema nodes),
value – the value of the attribute to filter vertices of the graph
operator – e.g. comparison operator for the values of node
reset – set the Graph in unfiltered state then filter, otherwise it’s a composite filtering
- Returns
pass self.fetch to the next chainable operation
-
filter_view
(value=None, attribute=None, operator='eq', reset=True)¶ - Notice1: to create a GraphView from a list/array of nodes
that is a result of previous SchemaPipe operations leave attribute=None, value=None to create a GraphView from a list/array of nodes that is a result from the execution of other Python commands leave attribute=None and set value=[set of nodes]
- Parameters
attribute – is a defined vertex property (node attribute) for filtering vertices of the graph (Schema nodes) to create a GraphView,
value – the value of the attribute to filter vertices of the graph
operator – e.g. comparison operator for the values of node
reset – set the GraphView in unfiltered state, otherwise it’s a composite filtering
- Returns
pass self.fetch to the next chainable operation
-
get_all_nodes
()¶ sets Graph or GraphView to the unfiltered state :return: all the nodes of the Graph or all the nodes of the GraphView
-
get_attributes
(junction=None)¶ - Parameters
junction – if True fetch those that are junction nodes else fetch non-junction attributes
- Returns
Attribute node ids of an Entity or Attribute node ids of a DataModel
-
get_components
()¶ Get node IDs for the components of a specific DataModel (Entity, Attribute) or DataSet (Table, Field, ….) It creates a filtered GraphView of the Schema for nodes that have dim3=SchemaNode.dim3
- Returns
self.fetch point to Entity, Attribute, Table, Field, GraphDataModel, GraphSchema nodes these node ids are passed to the next chainable operation
-
get_datamodels
()¶ Get DataModel node IDs of data model system (dms) :return: self.fetch points to the set of DataModel node ids, these are passed to the next chainable operation
-
get_datasets
()¶ Get DataSet node IDs of data resources system (drs) :return: self.fetch points to the set of DataSet node ids, these are passed to the next chainable operation
-
get_entities
()¶ Get Entity node IDs of a DataModel or Entity node IDs of an Attribute :return: self.fetch point to Entity nodes these nodes are passed to the next chainable operation
-
get_fields
(mapped=None)¶ Wrapped in Table(SchemaNode) class Get Field node IDs of a Table or Field node IDs of a DataSet :param mapped: if True return ONLY those fields that are mapped onto attributes
default return all fields
- Returns
self.fetch points to the set of Field node ids, these are passed to the next chainable operation
-
get_graph_datamodels
()¶ Get graph datamodel node ids :return: self.fetch that points to these node IDs
-
get_graph_schemata
()¶ Get graph schemata node ids :return: self.fetch that points to these node IDs
-
get_overview
()¶ Get an overview of systems, datasets, datamodels, etc by filtering Schema nodes that have dim2=0 :return: self.fetch point to the set of filtered node ids, these are passed to the next chainable operation
-
get_systems
()¶ Get System node IDs including the root system :return: self.fetch points to the set of System node ids, these are passed to the next chainable operation
-
get_tables
()¶ Get Table node IDs of a DataSet :return: self.fetch points to the set of Table node ids, these are passed to the next chainable operation
-
out
(**kwargs)¶ - Returns
use out() method at the end of the chained generative methods to return the
output of SchemaNode objects displayed with the appropriate specified format and structure
-
over
(select=None)¶ - Parameters
select – projection over the selected metadata columns
- Returns
modifies self._project
-
plot
(**kwargs)¶ Graphical output to visualize hypergraphs, it is also used in out() method (see IHyperGraphPlotter.plot method) Example: mis.get(535).to_hypergraph().plot() or mis.get(535).to_hypergraph().out()
- Parameters
kwargs –
- Returns
-
property
schema_node
¶
-
take
(select, key_column='cname')¶ Take specific nodes from the result of get_*() methods :param select: list of integers (node IDs) or
list of strings (cname(s), alias(es))
Notice: all selected nodes specified must exist otherwise it will raise an exception
- Parameters
key_column – e.g. cname, alias
- Returns
a subset of numpy array with node IDs
- Notice the difference:
over() is a projection over the selected metadata columns (e.g. nid, dim3, dim2,…) take() is a projection over the selected fields of a database table, flatfile (e.g. npi, city, state,…)
Example: mis.get(414).get_fields().over(‘nid, dim3, dim2, cname’)
.take(select=’npi, pacID, profID, city, state’).to_dataframe(‘dim3, dim2’).out()
-
to_dataframe
(index=None)¶ - Parameters
index – metadata column names to use in pandas dataframe index
- Returns
-
to_dict
(key_column, value_columns)¶ - Parameters
key_column – e.g. cname, alias, nid
value_columns – e.g. [‘cname, alias’]
- Returns
-
to_dict_records
(lazy=False)¶
-
to_entity
(entity_name='NEW Entity', entity_alias='NEW_ENT', entity_description=None, datamodel=None, datamodel_name='NEW DataModel', datamodel_alias='NEW_DM', datamodel_descr=None, attributes=None, as_names=None, as_types=None)¶ Map a Table object of a DataSet onto an Entity of a DataModel, there are two scenarios:
- Map Table to a new Entity and selected fields (or all fields) of the table onto new attributes
The new entity can be linked to a new datamodel (datamodel=None) or to an existing datamodel
- Map selected fields (or all fields) of a table onto existing attributes of a datamodel
It’s a bipartite matching of fields with attributes and there is one-to-one correspondence between fields and attributes. User must specify the datamodel parameter.
Notice1: The Field-Attribute relationship is a Many-To-One i.e. many fields of different Entity objects are mapped onto one (same) Attribute
Notice2: In both (a) and (b) cases fields are selected with a combination of get_fields() and take() SchemaPipe operations on the table object
Example for (a): get(414).get_fields().take(‘npi, pacID, profID, last, first, gender, graduated, city, state’).
to_entity(cname=’Physician’, alias=’Phys’).out()
Example for (b):
- Parameters
entity_name –
entity_alias –
entity_description –
datamodel – create a new datamodel by default or pass an existing DataModel object
datamodel_name –
datamodel_alias –
datamodel_descr –
attributes – list of integers (Attribute IDs) or list of strings (Attribute cnames, aliases) of an existing Entity or None (default) to create new Attributes
as_names – in the case of creating new attributes, list of strings one for each new attribute
as_types – in the case of creating new attributes, list of strings one for each new attribute Notice: data types can be inferred later on when we use arrow dictionary encoding…
- Returns
An Entity object
-
to_fields
()¶ converts a list of Attribute objects to a list of Field objects :return: list of fields that are mapped onto an Attribute
-
to_hypergraph
()¶
-
to_keys
(lazy=False)¶
-
to_nids
(lazy=False, array=True)¶
-
to_nodes
(lazy=False)¶
-
to_tuples
(lazy=False)¶
-
to_vertices
(lazy=False)¶
hypermorph.schema_sys module¶
hypermorph.test module¶
hypermorph.utils module¶
-
class
hypermorph.utils.
DSUtils
¶ Bases:
object
Data Structure Utils Class
-
static
numpy_sorted_index
(arr, adj=False, freq=False)¶ - Parameters
arr – numpy 1d array that represents a table column of data values of the same type in the case of numpy array with string values and missing data, null values must be represented with np.NaN
adj – if True return adjacency lists
freq – if True return frequencies
- Returns
secondary index, i.e. unique values of arr in ascending order without NaN (null)
for each unique value calculate a) list of primary key indices, i.e. pointers, to all rows of the table that contain that value
also known as adjacency lists in Graph terminology
- count the number of rows that contain that value,
also known as database cardinality (selectivity) also known as frequency in associative engine
-
static
numpy_to_pyarrow
(np_arr, dtype=None, dictionary=True)¶ - Parameters
np_arr – numpy 1d array that represents a table column of data values of the same type
dtype – data type
dictionary – whether to use dictionary encoded form or not
- Returns
pyarrow array representation of arr
-
static
pyarrow_chunked_to_dict
(chunked_array)¶ - Parameters
chunked_array – PyArrow ChunkedArray
- Returns
PyArrow Array / DictionaryArray
-
static
pyarrow_dict_to_arr
(dict_array)¶ - Parameters
dict_array – PyArrow DictionaryArray
- Returns
PyArrow 1d Array
-
static
pyarrow_dtype_from_string
(dtype, dictionary=False, precision=9, scale=3)¶ - Parameters
dtype – string that specifies the PyArrow data type
dictionary – pyarrow dictionary data type, i.e. pa.dictionary(pa.int32(), pa.vtype())
precision – for decimal128bit width arrow data type (number of digits in the number - integer+fractional)
scale – for decimal128bit width arrow data type (number of digits for the fractional part)
- Returns
pyarrow data type from a string
-
static
pyarrow_get_dtype
(arr)¶ - Parameters
arr – PyArrow 1d Array either dictionary encoded or not
- Returns
value type of PyArrow array elements
-
static
pyarrow_record_batch_to_table
(batch)¶
-
static
pyarrow_sort
(array, ascending=True)¶ - Parameters
array – PyArrow Array
ascending –
- Returns
-
static
pyarrow_table_to_record_batch
(table)¶ - Parameters
table – PyArrow Table
- Returns
PyArrow RecordBatch
-
static
pyarrow_to_numpy
(pa_arr)¶ - Parameters
pa_arr – PyArrow 1d Array or DictionaryArray
- Returns
NumPy 1d array
-
static
pyarrow_vtype_to_numpy_vtype
(arr)¶ - Parameters
arr – PyArrow 1d Array
- Returns
NumPy value type that is equivalent of PyArrow value type
-
static
-
class
hypermorph.utils.
DotDict
¶ Bases:
dict
dot.notation access to dictionary attributes
Example: person_dict = {‘first_name’: ‘John’, ‘last_name’: ‘Smith’, ‘age’: 32} address_dict = {‘country’: ‘UK’, ‘city’: ‘Sheffield’}
person = DotDict(person_dict) person.address = DotDict(address_dict)
print(person.first_name, person.last_name, person.age, person.address.country, person.address.city)
-
class
hypermorph.utils.
FileUtils
¶ Bases:
object
-
static
change_cwd
(fpath)¶
-
static
feather_to_arrow_schema
(source)¶
-
static
feather_to_arrow_table
(file_location, select=None, limit=None, offset=None, **pyarrow_kwargs)¶ This is using pyarrow.feather.read_table() https://arrow.apache.org/docs/python/generated/pyarrow.feather.read_table.html#pyarrow.feather.read_table
- Parameters
file_location – full path location of the file
select – use a subset of columns from feather file
limit – limit on the number of records to return
offset – exclude the first number of rows Notice: do not confuse offset with the number of rows to skip at the start of the flat file but in pandas.read_csv offset can also be used as skiprows
pyarrow_kwargs – other parameters that are passed to pyarrow.feather.read_table
- Returns
-
static
flatfile_delimiter
(file_type)¶ - Parameters
file_type – CSV, TSV these have default delimiters ‘,’ and ‘ ‘ respectively
- Returns
default delimiter or the specified delimiter in the argument
-
static
flatfile_drop_extention
(fname)¶
-
static
flatfile_header
(file_type, file_location, delimiter=None)¶ - Parameters
file_type – CSV, TSV these have default delimiters ‘,’ and ‘ ‘ respectively
delimiter – 1-character string specifying the boundary between fields of the record
file_location – full path location of the file with an extension (.tsv, .csv)
- Returns
field names in a list
-
static
flatfile_to_pandas_dataframe
(file_type, file_location, select=None, as_columns=None, as_types=None, index=None, partition_size=None, limit=None, offset=None, delimiter=None, nulls=None, **pandas_kwargs)¶ Read rows from flat file and convert them to pandas dataframe with pandas.read_csv https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
- Parameters
file_type – CSV, TSV these have default delimiters ‘,’ and ‘ ‘ respectively
file_location – full path location of the file
delimiter – 1-character string specifying the boundary between fields of the record
nulls – list of strings that denote nulls e.g. [‘N’]
partition_size – number of records to use for each partition or target size of each partition, in bytes
select – use a subset of columns from the flat file
as_columns – user specified column names for pandas dataframe (list of strings)
as_types – dictionary with column names as keys and data types as values this is used when we read data from flat files and we want to disable type inference on those columns
index – column names to be used in pandas dataframe index
limit – limit on the number of records to return
offset – exclude the first number of rows Notice: do not confuse offset with the number of rows to skip at the start of the flat file but in pandas.read_csv offset can also be used as skiprows
pandas_kwargs – other arguments of pandas read_csv method
- Returns
pandas dataframe
Example of read_cvs(): read_csv(source, sep=’|’, index_col=False, nrows=10, skiprows=3, header = 0
usecols=[‘catsid’, ‘catpid’, ‘catcost’, ‘catfoo’, ‘catchk’], dtype={‘catsid’:int, ‘catpid’:int, ‘catcost’:float, ‘catfoo’:float, ‘catchk’:bool}, parse_dates=[‘catdate’])
-
static
flatfile_to_pyarrow_table
(file_type, file_location, select=None, as_columns=None, as_types=None, partition_size=None, limit=None, offset=None, skip=0, delimiter=None, nulls=None)¶ Read columnar data from CSV files https://arrow.apache.org/docs/python/csv.html
- Parameters
file_type – CSV, TSV these have default delimiters ‘,’ and ‘ ‘ respectively
file_location – full path location of the file
delimiter – 1-character string specifying the boundary between fields of the record
nulls – list of strings that denote nulls e.g. [‘N’]
partition_size – number of records to use for each partition or target size of each partition, in bytes
select – list of column names to include in the pyarrow Table, default None (all columns)
as_columns – user specified column names for pandas dataframe (list of strings)
as_types – Map column names to column types (disabling type inference on those columns)
limit – limit on the number of rows to return
offset – exclude the first number of rows Notice: do not confuse offset with skip, offset is used after we read the table
skip – number of rows to skip at the start of the flat file
- Returns
pyarrow in-memory table
-
static
flatfile_to_python_lists
(file_type, file_location, nrows=10, skip_rows=1, delimiter=None)¶ - Parameters
file_type – CSV, TSV these have default delimiters ‘,’ and ‘ ‘ respectively
delimiter – 1-character string specifying the boundary between fields of the record
file_location – full path location of the file with an extension (.tsv, .csv)
nrows – number of rows to read from the file
skip_rows – number of rows to skip, default 1 skip the header of the file
- Returns
rows of the file as python lists
-
static
get_cwd
()¶
-
static
get_filenames
(path, extension='json', window_title='Choose files', gui=False, select=None)¶
-
static
get_full_path
(path)¶
-
static
get_full_path_filename
(p, f)¶
-
static
get_full_path_parent
(path)¶
-
static
json_to_dict
(fname)¶
-
static
parquet_metadata
(source, **pyarrow_kwargs)¶
-
static
parquet_to_arrow_schema
(source, **pyarrow_kwargs)¶
-
static
parquet_to_arrow_table
(file_location, select=None, limit=None, offset=None, arrow_encoding=False, **pyarrow_kwargs)¶ This is using pyarrow.parquet.read_table() https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html
- Parameters
file_location – full path location of the file
select – use a subset of columns from parquet file
limit – limit on the number of records to return
offset – exclude the first number of rows Notice: do not confuse offset with the number of rows to skip at the start of the flat file but in pandas.read_csv offset can also be used as skiprows
arrow_encoding – PyArrow dictionary encoding
pyarrow_kwargs – other parameters that are passed to pyarrow.parquet.read_table
- Returns
-
static
pyarrow_read_record_batch
(file_location, table=False)¶ - Parameters
file_location –
table –
- Returns
Either PyArrow RecordBatch, or PyArrow Table if table=True
-
static
pyarrow_table_to_feather
(table, file_location, **feather_kwargs)¶ Write a Table to Feather format :param table: pyarrow Table :param file_location: full path location of the feather file :param feather_kwargs: https://arrow.apache.org/docs/python/generated/pyarrow.feather.write_feather.html#pyarrow.feather.write_feather :return:
-
static
pyarrow_table_to_parquet
(table, file_location, **pyarrow_kwargs)¶ Write a Table to Parquet format :param table: pyarrow Table :param file_location: full path location of the parquet file :param pyarrow_kwargs: row_group_size, version, use_dictionary, compression (see… https://pyarrow.readthedocs.io/en/latest/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table :return:
-
static
pyarrow_write_record_batch
(record_batch, file_location)¶ - Parameters
record_batch – PyArrow RecordBatch
file_location –
- Returns
-
static
write_json
(data, fname)¶
-
static
-
class
hypermorph.utils.
GenerativeBase
¶ Bases:
object
http://derrickgilland.com/posts/introduction-to-generative-classes-in-python/ A Python Generative Class is defined as a class that returns or clones, i.e. generates, itself when accessed by certain means This type of class can be used to implement method chaining or to mutate an object’s state without modifying the original class instance.
-
class
hypermorph.utils.
MemStats
¶ Bases:
object
Compare memory statistics with free -m Units are in MiB memibytes, 1 MiB = 2^20 bytes
-
property
available
¶
-
property
buffers
¶
-
property
cached
¶
-
property
cpu
¶
-
property
difference
¶
-
property
free
¶
-
property
mem
¶
-
print_stats
()¶
-
property
total
¶
-
property
used
¶
-
property
-
class
hypermorph.utils.
PandasUtils
¶ Bases:
object
pandas dataframe utility methods
-
static
dataframe
(iterable, columns=None, ndx=None)¶ - Parameters
iterable – e.g. list like objects
columns – comma separated string or list of strings labels to use for the columns of the resulting dataframe
ndx – comma separated string or list of strings column names to use for the index of resulting dataframe
- Returns
pandas dataframe with an optional index
-
static
dataframe_cardinality
(df)¶
-
static
dataframe_concat_columns
(df1, df2)¶
-
static
dataframe_memory_usage
(df, deep=False)¶
-
static
dataframe_selectivity
(df)¶
-
static
dataframe_to_pyarrow_table
(df, columns=None, schema=None, index=False)¶ - Parameters
df – pandas dataframe
columns – List of column to be converted. If None, use all columns
schema – the expected pyarrow schema of the pyarrow Table
index – Whether to store the index as an additional column in the resulting Table.
- Returns
pyarrow.Table
-
static
dataframes_to_html
(*df_stylers)¶
-
static
dict_to_dataframe
(d, labels)¶
-
static
-
hypermorph.utils.
bytes2mb
(b)¶
-
hypermorph.utils.
get_size
(obj)¶ sum size of object & members.
-
hypermorph.utils.
highlight_states
(s)¶
-
hypermorph.utils.
session_time
()¶
-
hypermorph.utils.
split_comma_string
(names)¶
-
hypermorph.utils.
sql_construct
(select, frm, where=None, group_by=None, having=None, order=None, limit=None, offset=None)¶
-
hypermorph.utils.
zip_with_scalar
(num, arr)¶ Use: to generate hyperbond (hb2, hb1), hyperatom (ha2, ha1) tuples :param num: scalar value :param arr: array of values :return: generator of tuples in the form (i, num) where i in arr
Module contents¶
This file is part of HyperMorph operational API for information management and data transformations on Associative Semiotic Hypergraph Development Framework (C) 2015-2019 Athanassios I. Hatzis
HyperMorph is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License v.3.0 as published by the Free Software Foundation.
HyperMorph is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with HyperMorph. If not, see <https://www.gnu.org/licenses/>.