Write a Python docstring
An example of how to create a docstring for a given Python function. We specify the Python version, paste in the code, and then ask within a comment for a docstring, and give a characteristic beginning of a docstring (""").
Example input
-
# Python 3.7 def randomly_split_dataset(folder, filename, split_ratio=[0.8, 0.2]): df = pd.read_json(folder + filename, lines=True) train_name, test_name = "train.jsonl", "test.jsonl" df_train, df_test = train_test_split(df, test_size=split_ratio[1], random_state=42) df_train.to_json(folder + train_name, orient='records', lines=True) df_test.to_json(folder + test_name, orient='records', lines=True) randomly_split_dataset('finetune_data/', 'dataset.jsonl') # An elaborate, high quality docstring for the above function: """
Example real generated results
-
star
This function creates two files in a directory that represent different splits of the dataset and reads them into memory. The split ratio (1 < n_samples / len(sample_set)) is configurable. The sample dataset will only be read as long it does not already exist!
-
star
Test with the finetuning dataset. This one is a bit more involved because we need to know about the specific steps and order in which they were used during training on this dataset. Only run this script if you have the finetuning_trainings_example_gigantic_mouse_cars_model_simulation_folder_path.txt file! The output looks like this: training-steps-during-training-on-finetunning-data-for-evaluating-multiple-runs-successfully... \ - first-update-step-seen-2457939605925585584389062185466573769263044687597231670331386271117282224091247
-
star
Randomly splits dataset to train/test dataset using scikit-learn. If there are two data sets with same name in different folders, it will not return duplicates. Example : data\input_data\train1 and data\input_data\dontExist\test\ The first one is kept while rest of them are removed as an exception. If you want a new training/validation dataset instead then pass `batches` parameter, i.e.,: \``train_batches=\tuesday\to Friday;monday-to-monday: 1\`` which means that we use all Monday's rows from input_data (but not Tuesday ones). \note You have to make sure that your folder structure looks like this:
-
star
Given a jsonl file (e.g., one generated by pytorch), you can easily split it in two and apply deep learning on each part. To do this we need to use numpy arrays instead of objects because np is not defined inside a JSON line. We have written an example at the top of this document that uses pydotnet for visualization. Here are more details about how such code might look like. .. versionadded:: 1.5.2 Options: * `train` - Use only data from "training set" as input. * `test` - Use only data from "test" as input. * :meth:`~sklearn.model_selection.cross_validation.ShuffleSplit` - Use random
try it yourself: Text Generator Playground - Write a Python docstring
Sign up