prepropy package¶
Submodules¶
prepropy.eda module¶
-
prepropy.eda.eda(df, target)[source]¶ Generates a dictionary to access summary statistics of the given data frame
- Parameters
df (pandas.DataFrame) – input dataframe to be analyzed
target (string) – target column name
- Returns
dict – access summary statistics of the given data frame.
cor – the correlation map
Examples
>>> from propropy import eda >>> url1 = "https://archive.ics.uci.edu/ml/machine-learning-databases/" >>> url2 = "wine-quality/winequality-red.csv" >>> url = url1+url2 >>> df = pd.read_csv(url, ";") >>> target = "quality" >>> res = eda(df,quality)
prepropy.imputation module¶
-
class
prepropy.imputation.imputation(method)[source]¶ Bases:
objectGenerates an instance of an imputation class for imputation on missing data
- Parameters
method (str) – method we wish to do the imputing.
values (numpy array) – an array with values to be imputed. Default None
- Returns
- Return type
An instance of the imputation class
Examples
>>>test_df = pd.DataFrame([[np.nan,2,3],[2,np.nan,4],[5,6,7]]) >>>imputer = imputation(‘mean’)
-
fill(data_for_fill)[source]¶ Fills the missing values in each column
- Parameters
data_for_fill (pandas.core.frame.DataFrame) – a pandas dataframe that we wish to fill the missing values with
- Returns
- Return type
A dataframe with the missing values imputed
Examples
>>>test_df = pd.DataFrame([[np.nan,2,3],[2,np.nan,4],[5,6,7]]) >>>imputer = imputation(‘mean’) >>>imputer.fit(test_df) >>>new = imputer.fill(test_df) >>>test_df2 = pd.DataFrame([[1,10,8],[5,2,6],[np.nan,3,np.nan]]) >>>new2 = imputer.fill(test_df2)
-
fit(data)[source]¶ Calculates the value to be imputated for each column
- Parameters
data (pandas.core.frame.DataFrame) – a pandas dataframe
- Returns
- Return type
An instance of the imputation class
Examples
>>>test_df = pd.DataFrame([[np.nan,2,3],[2,np.nan,4],[5,6,7]]) >>>imputer = imputation(‘mean’) >>>imputer.fit(test_df)
prepropy.scaler module¶
-
prepropy.scaler.scaler(X_train, X_Valid, X_test, scale_features, scaler_type='StandardScaler')[source]¶ This function scales numerical features based on scaling requirement
- Parameters
X_train (pandas.core.frame.DataFrame, numpy array or list) – The DataFrame, numpy array or list
X_Valid (pandas.core.frame.DataFrame, numpy array or list) – The DataFrame, numpy array or list
X_test (pandas.core.frame.DataFrame, numpy array or list) – The DataFrame, numpy array or list
scale_features (list of strings) – The list of numerical features to be scaled
scaler_type (string) – The type of scaling to perform on the numerical columns.
- Returns
dict containing three dataframes with scaled features
- Return type
dict
Examples
>>>scaler(X_train, X_Valid, X_test, scale_features, scaler_type=”MaxAbsScaler”) # noqa: E501