Quickstart¶
This part of documentation covers a quick introduction in how to get startet with conML.
Define main ingredients¶
Begin by importing the conML module:
>>> import conML
Now, request the constructor. The constructor needs a list of tuples consisting
of an instantiated unsupervised machine learning models from
scikit learn library and the corresponding abbreviations. In addition, the type
of construction must also be specified. Currently only
conceptual
construction is supported.:
>>> from sklearn.cluster import KMeans
>>> from sklearn.cluster import AgglomerativeClustering
>>> unsup_models = [("Kme", Kmeans()), ("Agg", AgglormerativeClustering())]
>>> constructor = conML.construction("conceptual", unsup_models)
The second component is the feature selector. The feature selector consists of filter methods and embedded methods. It is important to know that embedded methods are applied as soon as a predefined number of features or samples is exceeded, otherwise filter methods are used.:
>>> from sklearn.feature_selection import VarianceTrheshold, SelectFromModel
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> filter = VarianceThreshold(2000)
>>> embedded = SelectFromModel(ExtraTreesClassifier())
>>> selector = conML.feature_selection(filter_method=variance, embedded_method=embedded)
Next, you define a reconstructor. The definition follows the same previous scheme, only this time you have to select supervised learning models.:
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.svm import SVC
>>> from sklearn.neighbors import KNeighborsClassifier
>>> sup_models = [("Rf", RandomForestClassifier()),
>>> ("Svc", SVC()),
>>> ("Kne", KNeighborsClassifier())]
>>> reconstructor = conML.reconstruction("conceptual", sup_models)
Finally, you define the deconstructor. The only dependency that the deconstructor needs is the previous defined reconstructor.:
>>> deconstructor = conML.deconstruction("conceptual", reconstructor)
The knowledge search operates on blocks of the data type pandas.DataFrame. It is recommended to pass the blocks with the help of a generator. First lets load the example dataset. You should name the features as «0.0.n», where n is the feature number. T column should contain the timestamps, Sigma and Z should be empty.
>>> import os
>>> import pandas
>>> path = os.path.join(os.path.expanduser("~"), ".conML", "toyset.csv")
>>> columns_names = [f"0.0.{i}" for i in range(1, 353)] + ["T", "Sigma", "Z"]
>>> df = pd.read_table(path, index_col=False, sep=" ", names=columns_names)
>>> df["Z"], df["Sigma"] = "", ""
After loading the example dataset, define the generator who yields 100 sample blocks.
>>> block_size = 100
>>> def generate_blocks():
>>> for start in range(0, df.shape[0], block_size):
>>> yield df.iloc[start:start+block_size]
Starting the knowledge search¶
After defining the individual components and the block generator the knowledge search can
be started. Use a contextmanager to get a KnowlegeSearcher
object.
After every block processing, the database and the unused fraction of the block is returned.
Save them in a list for later analysis. To track the knowledge search pass True
to stdout parameter:
>>> components = (constructor, selector, reconstructor, deconstructor)
>>> dbs, haldes = [], []
>>> with conML.knowledge_searcher(*components, stdout=True) as searcher:
>>> for block in generate_blocks():
>>> db, halde = searcher.search(block)
>>> dbs.append(db)
>>> haldes.append(halde)
Saving the knowledge database¶
Now that you have successfully completed the knowledge search save the database on your harddrive:
>>> home_path = os.path.expanduser("~")
>>> db.save(home_path)