Step

Step to load and export and data in an easy and consistent way. is present at notebook level when

source

GStep

 GStep (*args, **kwargs)

Singleton Step used at package level


source

Step

 Step (root:str|None='./data', attrs:str|list[str]|None=None,
       version:str|None=':default', file_name:str|None=':auto',
       method_in:str|object|None=':auto', root_in:str|None=':default',
       attrs_in:str|list[str]|None=':default', step_in:str|None=None,
       version_in:str|None=':default', file_name_in:str|None=':default',
       method_out:str|object|None=':auto', root_out:str|None=':default',
       attrs_out:str|list[str]|None=':default', step_out:str|None=None,
       version_out:str|None=':default', file_name_out:str|None=':default',
       md_all_files:list[FileMetaData]=None,
       md_direct_input_files:list[FileMetaData]=None)

Step Class for easy data loading and exporting. Also present at package level

Type Default Details
root str | None ./data Default root folder of data path. Not exported in metadata
attrs str | list[str] | None None Default attributes part of the path
version str | None :default Default version name (cannot use :last or other custom variables)
file_name str | None :auto Specify the file name. See file_name_in and file_name_out for more details on :auto behaviour
method_in str | object | None :auto Default method to load the data. # Method to load the data. Can be a function with path as first argument or a string among [csv, excel, xlsx, xls, parquet, json, pickle, feather, hdf, sql, pkl].
root_in str | None :default Default root folder when loading [not recommended, use root instead]
attrs_in str | list[str] | None :default Default attributes when loading
step_in str | None None Default step name when loading
version_in str | None :default Default version name when loading
file_name_in str | None :default Default file name when loading
method_out str | object | None :auto Default method to save the data. Can a function with path as first argument or a string among [csv, excel, xlsx, xls, parquet, json, pickle, feather, hdf, sql, pkl]
root_out str | None :default Default root folder when saving [not recommended, use root instead]
attrs_out str | list[str] | None :default Default attributes when saving
step_out str | None None Default step name when saving
version_out str | None :default Default version name when saving
file_name_out str | None :default Default file name when saving
md_all_files list[FileMetaData] None Internal. Do not use
md_direct_input_files list[FileMetaData] None Internal. Do not use

source

Step.load

 Step.load (root:Union[str,Literal[':default']]=':default',
            attrs:Union[list,str,NoneType,Literal[':default']]=':default',
            step:Union[str,NoneType,Literal[':default']]=':default', versi
            on:Union[str,NoneType,Literal[':default',':last',':first']]=':
            default',
            file_name:Union[str,Literal[':default',':auto']]=':default', m
            ethod:Union[str,object,Literal[':default',':auto']]=':default'
            , alias:str=':ignore', file_glob:bool=False,
            verbose:bool=False, **kwargs)

Load data with path such as root/*attrs/step/version/file_name

Type Default Details
root str | Literal[‘:default’] :default Root folder of the data. Not exported in metadata
attrs list | str | None | Literal[‘:default’] :default Attributes part of the path
step str | None | Literal[‘:default’] :default Step name, converted to step_{step_name} in the path
version str | None | Literal[‘:default’, ‘:last’, ‘:first’] :default Version name, converted to v_{version_name} in the path. if :default, uses :last, if :last uses last version based on its name. if :first, uses first version based on its name
file_name str | Literal[‘:default’, ‘:auto’] :default File name. automatically inferred if there is only one file in the directory
method str | object | Literal[‘:default’, ‘:auto’] :default Method to load the data. Can be a function with path as first argument or a string among [csv, excel, xlsx, xls, parquet, json, pickle, feather, hdf, sql, pkl].
alias str :ignore Alias of the dataset to document it and its columns. (feature in development)
file_glob bool False If True, file_name can be a glob pattern
verbose bool False If True, print info messages
kwargs
Returns Tuple[Any, dict] | Any Loaded data

source

Step.save

 Step.save (data:Union[pandas.core.frame.DataFrame,Any],
            root:Union[str,Literal[':default']]=':default',
            attrs:Union[list,str,NoneType,Literal[':default']]=':default',
            step:Union[str,NoneType,Literal[':default']]=':default', versi
            on:Union[str,NoneType,Literal[':default'],stdflow.stdflow_type
            s.strftime_type.Strftime]=':default',
            file_name:Union[str,Literal[':default',':auto']]=':default', m
            ethod:Union[str,object,Literal[':default',':auto']]=':default'
            , alias:str=':ignore', export_viz_tool:bool=False,
            verbose:bool=False, **kwargs)

Save data with path such as root/attrs/step/version/file_name

Type Default Details
data pd.DataFrame | Any data to save
root str | Literal[‘:default’] :default Root folder of the data. Not exported in metadata
attrs list | str | None | Literal[‘:default’] :default Attributes part of the path
step str | None | Literal[‘:default’] :default Step name, converted to step_{step_name} in the path
version str | None | Literal[‘:default’] | Strftime :default Version name, converted to v_{version_name} in the path. by default uses the current date in format %Y%m%d%H%M
file_name str | Literal[‘:default’, ‘:auto’] :default File name. automatically inferred if there is only one input file
method str | object | Literal[‘:default’, ‘:auto’] :default Method to save the data. Can a function with path as first argument or a string among [csv, excel, xlsx, xls, parquet, json, pickle, feather, hdf, sql, pkl]. If function, the first argument must be the path
alias str :ignore Alias of the dataset to document it and its columns. (feature in development)
export_viz_tool bool False If True, export html view of the data and the pipeline it comes from
verbose bool False If True, print info messages
kwargs
Returns DataPath Path object describing where the data is saved

source

Step.var

 Step.var (key, value, force=False)

Set a variable which can be overwritten if specified in StepRunner / Pipeline

step.save(
    df,
    root="../demo_project/data",
    attrs="lake",
    file_name="countries of the world.csv",
    version=":default",
    method="csv",
    verbose=True,
)
sf.root = ../demo_project/data
sf.attrs = lake
sf.step = None
sf.version = %Y%m%d%H%M
sf.file_name = countries of the world.csv
sf.method = csv
Saving data to ../demo_project/data/lake/v_202310121113/countries of the world.csv
attrs=lake::step_name=None::version=202310121113::file_name=countries of the world.csv