prepropy package

Submodules

prepropy.eda module

prepropy.eda.eda(df, target)[source]

Generates a dictionary to access summary statistics of the given data frame

Parameters
  • df (pandas.DataFrame) – input dataframe to be analyzed

  • target (string) – target column name

Returns

  • dict – access summary statistics of the given data frame.

  • cor – the correlation map

Examples

>>> from propropy import eda
>>> url1 = "https://archive.ics.uci.edu/ml/machine-learning-databases/"
>>> url2 = "wine-quality/winequality-red.csv"
>>> url = url1+url2
>>> df = pd.read_csv(url, ";")
>>> target = "quality"
>>> res = eda(df,quality)

prepropy.imputation module

class prepropy.imputation.imputation(method)[source]

Bases: object

Generates an instance of an imputation class for imputation on missing data

Parameters
  • method (str) – method we wish to do the imputing.

  • values (numpy array) – an array with values to be imputed. Default None

Returns

Return type

An instance of the imputation class

Examples

>>>test_df = pd.DataFrame([[np.nan,2,3],[2,np.nan,4],[5,6,7]]) >>>imputer = imputation(‘mean’)

fill(data_for_fill)[source]

Fills the missing values in each column

Parameters

data_for_fill (pandas.core.frame.DataFrame) – a pandas dataframe that we wish to fill the missing values with

Returns

Return type

A dataframe with the missing values imputed

Examples

>>>test_df = pd.DataFrame([[np.nan,2,3],[2,np.nan,4],[5,6,7]]) >>>imputer = imputation(‘mean’) >>>imputer.fit(test_df) >>>new = imputer.fill(test_df) >>>test_df2 = pd.DataFrame([[1,10,8],[5,2,6],[np.nan,3,np.nan]]) >>>new2 = imputer.fill(test_df2)

fit(data)[source]

Calculates the value to be imputated for each column

Parameters

data (pandas.core.frame.DataFrame) – a pandas dataframe

Returns

Return type

An instance of the imputation class

Examples

>>>test_df = pd.DataFrame([[np.nan,2,3],[2,np.nan,4],[5,6,7]]) >>>imputer = imputation(‘mean’) >>>imputer.fit(test_df)

prepropy.scaler module

prepropy.scaler.scaler(X_train, X_Valid, X_test, scale_features, scaler_type='StandardScaler')[source]

This function scales numerical features based on scaling requirement

Parameters
  • X_train (pandas.core.frame.DataFrame, numpy array or list) – The DataFrame, numpy array or list

  • X_Valid (pandas.core.frame.DataFrame, numpy array or list) – The DataFrame, numpy array or list

  • X_test (pandas.core.frame.DataFrame, numpy array or list) – The DataFrame, numpy array or list

  • scale_features (list of strings) – The list of numerical features to be scaled

  • scaler_type (string) – The type of scaling to perform on the numerical columns.

Returns

dict containing three dataframes with scaled features

Return type

dict

Examples

>>>scaler(X_train, X_Valid, X_test, scale_features, scaler_type=”MaxAbsScaler”) # noqa: E501

Module contents