prepropy package¶

Submodules¶

prepropy.eda module¶

prepropy.eda.eda(df, target)[source]¶

Generates a dictionary to access summary statistics of the given data frame

Parameters

df (pandas.DataFrame) – input dataframe to be analyzed
target (string) – target column name

Returns

dict – access summary statistics of the given data frame.
cor – the correlation map

Examples

>>> from propropy import eda
>>> url1 = "https://archive.ics.uci.edu/ml/machine-learning-databases/"
>>> url2 = "wine-quality/winequality-red.csv"
>>> url = url1+url2
>>> df = pd.read_csv(url, ";")
>>> target = "quality"
>>> res = eda(df,quality)

prepropy.imputation module¶

class prepropy.imputation.imputation(method)[source]¶

Bases: object

Generates an instance of an imputation class for imputation on missing data

Parameters

method (str) – method we wish to do the imputing.
values (numpy array) – an array with values to be imputed. Default None

Returns

Return type

An instance of the imputation class

Examples

>>>test_df = pd.DataFrame([[np.nan,2,3],[2,np.nan,4],[5,6,7]]) >>>imputer = imputation(‘mean’)

fill(data_for_fill)[source]¶

Fills the missing values in each column

Parameters: data_for_fill (pandas.core.frame.DataFrame) – a pandas dataframe that we wish to fill the missing values with
Returns
Return type: A dataframe with the missing values imputed

Examples

>>>test_df = pd.DataFrame([[np.nan,2,3],[2,np.nan,4],[5,6,7]]) >>>imputer = imputation(‘mean’) >>>imputer.fit(test_df) >>>new = imputer.fill(test_df) >>>test_df2 = pd.DataFrame([[1,10,8],[5,2,6],[np.nan,3,np.nan]]) >>>new2 = imputer.fill(test_df2)

fit(data)[source]¶

Calculates the value to be imputated for each column

Parameters: data (pandas.core.frame.DataFrame) – a pandas dataframe
Returns
Return type: An instance of the imputation class

Examples

>>>test_df = pd.DataFrame([[np.nan,2,3],[2,np.nan,4],[5,6,7]]) >>>imputer = imputation(‘mean’) >>>imputer.fit(test_df)

prepropy.scaler module¶

prepropy.scaler.scaler(X_train, X_Valid, X_test, scale_features, scaler_type='StandardScaler')[source]¶

This function scales numerical features based on scaling requirement

Parameters

X_train (pandas.core.frame.DataFrame, numpy array or list) – The DataFrame, numpy array or list
X_Valid (pandas.core.frame.DataFrame, numpy array or list) – The DataFrame, numpy array or list
X_test (pandas.core.frame.DataFrame, numpy array or list) – The DataFrame, numpy array or list
scale_features (list of strings) – The list of numerical features to be scaled
scaler_type (string) – The type of scaling to perform on the numerical columns.

Returns

dict containing three dataframes with scaled features

Return type

dict

Examples

>>>scaler(X_train, X_Valid, X_test, scale_features, scaler_type=”MaxAbsScaler”) # noqa: E501

prepropy package¶

Submodules¶

prepropy.eda module¶

prepropy.imputation module¶

prepropy.scaler module¶

Module contents¶

prepropy

Navigation