Tecan OD Analyzer¶
Description¶
tecan_od_analyzer is a Python package for analysing optical density (OD) measurements taken from the Tecan Nano plate reader.
The tool parses the individual xlsx files from the plate reader and merges them into a single xlsx file using the autoflow_parse library. The merged file is read as a dataframe and every sample is labelled according to the calc.tsv file, provided by the user. The labelling helps to differentiate the sample purpose, indeed, some samples correspond to growth rate estimation and plotting while others are used to estimate the volume loss.
Once the samples are labelled according to the experiment, the volume loss throughout the culture is estimated and its effect is neutralized using a simple regression model. The next step concerns the outlier detection and growth phase estimation, which are done by using the croissance package. Subsequently, growth rate plots and summary statistics are also computed. The library also provides the functionality of interpolating OD measurements on processed samples at any given time.
Installation¶
Installation using pip¶
pip install tecan_od_analyzer
Installation from GitHub using pip¶
pip install git+https://github.com/biosustain/AutoFlow-HTC
Usage¶
tecan_od_analyzer can be used from the command-line by executing it in the directory where the xlsx files are located. The outputs will be gathered on a new directory called “results”.
Command line usage¶
Standard usage :¶
tecan_od_analyzer
The default command produces growth phase estimation, summary statistics on the estimations and growth rate plots split only by species. By default the volumess loss correction is computed.
Options :¶
tecan_od_analyzer --resultsdir RESULTSDIR
specifies where the result
will be redirected can be a directory name or a path.
tecan_od_analyzer --path PATH
specifies where the data is,
without the option the program runs in the path where it is executed.
tecan_od_analyzer --estimations
Outputs only estimations for
every sample in a text file.
tecan_od_analyzer --figures
Outputs only the growth curves.
tecan_od_analyzer --summary
Outputs only the estimations for
every species and bioshaker as well as boxplots of the growth rare
annotation parameters.
tecan_od_analyzer --individual
Outputs the growth curves for
every sample individually.
tecan_od_analyzer --bioshaker
Splits the visualization of the
growth rate plots according to the bioshaker and colors them by species.
tecan_od_analyzer --bioshakercolor
Splits the visualization of the
growth rate plots according to species and does not color them by
bioshaker.
tecan_od_analyzer --interpolationplot
Outputs Growth rate curves
instead of scatter plots.
tecan_od_analyzer --interpolation
Computes interpolation of
samples given the measure time and outputs an xlsx file with the
estimations.
tecan_od_analyzer --volumeloss
This option allows the user to
not compute the volume loss correction. By default, the volume loss
correction is always computed.
tecan_od_analyzer --exportsvg
With this option, plots will be
saved as .svg rather than .png files. This is preferred if they are
intended for a publication and allows for modifications in Illustrator.
tecan_od_analyzer --onlyspecies
Splits the visualization of the
growth rate plots according to species and bioshaker.
tecan_od_analzer --legendoff
Disables display of the legend
in plots.
Input¶
Standard required input¶
In order to run the program the user has to execute it where the data is. The inputs to the program correspond to the ones required for the autoflow_parser (log file, xlsx file, etc).
Furthermore, to classify the samples, a file where the purpose of each sample figures is needed. This file must be a tab-separated file (.tsv) with the following format :
Sample_ID |
gr_calc |
vl_calc |
Species |
Drop_out |
---|---|---|---|---|
BS1.A1 |
TRUE |
FALSE |
TRUE |
|
BS1.A2 |
FALSE |
TRUE |
FALSE |
|
… |
… |
… |
… |
… |
It is important that the headers of every column must be written as it can be seen in the table. Concerning the Sample_ID, the bioshaker must appear at the beggining of the string.
OD interpolation required input¶
To compute the estimations the user must provide a tsv file named as
od_measurements.tsv
with the following format :
Sample_ID |
Time |
Regression_used |
---|---|---|
BS1.A1_ |
0.9 |
well |
BS1.A2_ |
0.02 |
mean |
… |
… |
… |
For the regression column, two options are possible. On the first hand,
the well
option corresponds to interpolate a given OD using only
the data of the corresponding well/sample. On the second hand, the
mean
option computes the interpolation using all the samples that
share the same species and bioshaker.
It’s relevant to remark, that the numbers appearing in the time column must be written with dots and not with commas. The unit for the time column corresponds to hours. The sample_ID must be followed by the species ID.
Plotting options¶
The plots can be customized by selecting how to group the samples and combine them on a single plot. By default, the generated plot will contain all the samples within the same species in one plot. The plots can also be generated separately and split or color labelled by bioshaker. Also, the legends can be disabled which might come in handy if a lot of different strains are displayed. The output images can be generated as .png or .svg files.
The different options can be consulted by typing :
tecan_od_analyzer --help
or tecan_od_analyzer -h
Results¶
It must be noted that all the time units will appear in hours.
The Results directory contains an example of the data obtained by running the program with the following command : tecan_od_analyzer -bc
Figures¶
Plot of volume loss correlation against the time.
lm_volume_loss.png
Growth rate measurements according to the specified options.
The name will change depending on the plotting option. It usually contains the sample ID and the bioshaker.
Boxplots of the linear phase parameters for splitted by species
and bioshakers. Some of the parameters are the intercept, the beggining and end of the linear phase, slope, etc.
Estimations / Linear phase estimations¶
Linear phase annotations
annotations.xlsx
file containing the
linear phase estimated parameters for all samples. - errors.txt file containing the list of samples for which the linear phase estimation resulted in an error. - Data_series.xlsx file containing all the data points after dilution and volume loss correction. The outliers have also been removed.
Summary statistics¶
summary_stats.xlsx file containing summary statistics of the
estimated parameters grouping by species, by bioshaker and both.
Temporary growth rate calculation¶
A temporary alternative is provided until the underlying issue
with the implemented growth rate calculation is fixed. The specific
growth rates between every time step are calculated and provided in
an .xlsx file and the progression of specific growth rates is
plotted for every well. These files can be found in the
Temporary_GR_check
folder in the produced Results
folder.
Contributing¶
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
License¶
MIT
Credits¶
This autoflow_parser part of the package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
tecan_od_analyzer methods¶
Main module.
- tecan_od_analyzer.tecan_od_analyzer.argument_parser(argv_list=None)¶
- Asesses the input arguments and outputs flags to run the different
functions according to the user needs.
- Parameters
argv_list – List of arguments provided by the user when running the
program. –
- Returns
flags
- tecan_od_analyzer.tecan_od_analyzer.compensation_lm(cor_df, df_gr, df_vl600, flag_svg=False, flag_loff=False)¶
Given the correlation between volume and time, a linear model is built and plotted, the correction is applied to the growth measurements using the linear model, returns a figure with the LM and a dataframe with the corrected growth rate measurements.
- Parameters
df_gr – dataframe containing only growth rate measurements and
time in hours. (differential) –
cor_df – dataframe containing correlation measures between volume loss
time for different bioshakers. (and) –
- Returns
figure representing the linear model between the correlation and the time for every bioshaker. df_gr_comp_out: dataframe containing corrected growth rate measurements and differential time in hours.
- Return type
fig
- tecan_od_analyzer.tecan_od_analyzer.estimation_writter(df_data_series, df_annotations, error_list)¶
Writes a xlsx file with the estimations for every sample and outputs the errors on a log file.
- Parameters
df_data_series – dataframe containing the time series without outliers. df_annotations: dataframe containing the annotations of the linear phase.
error_list – list containing the non-estimated samples by croissance
to noisy data (due) –
- Returns
file containing the estimations and IDs annotations_xlsx_file: file containing the data series and IDs without outliers of the non-estimated samples log_file: file containing the non estimated samples
- Return type
series_xlsx_file
- tecan_od_analyzer.tecan_od_analyzer.exponential(x, intercep, slope, n0)¶
Calculates the od for a given time with the parameters estimated with the croissance package
- tecan_od_analyzer.tecan_od_analyzer.gr_estimation(df_gr_final)¶
removes outliers for every sample and outputs growth rate estimates for every given sample ID as a text file
- Parameters
df_gr_final – dataframe containing growth rate measurements and
time in hours. (differential) –
- Returns
List of growth rate estimations for every sample in df_gr_final errors: due to croissance noise handling some samples can not be estimated and list of series is returned for every non-estimated sample
- Return type
estimations
- tecan_od_analyzer.tecan_od_analyzer.gr_plots(df, sample, interpolationplot, color_=None, ind=False, legend_='bioshaker', title_='species', separate_species=False, flag_svg=False, flag_loff=False)¶
Generates a growth curve plot for a given series for common species, returns the plot.
- Parameters
df – dataframe containing differential times and OD measurements
sample – sample used
ind – flag that indicates to output individual plots if True or merged
by sample species if False (plots) –
- Returns
object containing the figure plt.savefig: saving the figure as a png or svg file
- Return type
fig
- tecan_od_analyzer.tecan_od_analyzer.input_output(cmd_dir, path)¶
Interprets input arguments related to the path to the data and output directory Args: cmd_dir : Name of directory where output will be sent path : path where the data is
- tecan_od_analyzer.tecan_od_analyzer.interpolation(od_measurements, df_annotations, mean_df_bs)¶
Interpolates the values of given od readings and returns growth rate measurements.
- Parameters
od_measurements – Dataframe containing the desired samples to estimate
df_annotations – Dataframe containing growth rate annotations of every
sample –
mean_df_bs – Dataframe containing the growth rate annotations grouped
common species and bioshaker (by) –
- Returns
Returned estimated od measurements and if the prediction lies in the model’s range
- Return type
od_measurements
- tecan_od_analyzer.tecan_od_analyzer.parse_data()¶
Calls the autoflow_parser and returns a merged xlsx document with all the OD readings combined
- tecan_od_analyzer.tecan_od_analyzer.read_xlsx(filename='results.xlsx')¶
Reads .xlsx file, returns a dataframe with relevant variables. The output of the parser is set to be “results.xlsx”, the default reads the mentioned file without any additional argument
- tecan_od_analyzer.tecan_od_analyzer.reshape_dataframe(df_gr, flag_species=False, flag_bioshaker=False)¶
Collects the times belonging to every sample and creates a time column relative to a specific sample, returns the modified dataframe.
- Parameters
df_gr – dataframe containing growth rate measurements and differential
in hours. (time) –
species_flag – flag that corresponds to the presence of more than one
species (True) –
- Returns
if species_flag is False
df_gr: dataframe with differential time measurements in hours displayed horizontally (one column containing the time measurements and one column contaning the OD measurements PER SAMPLE).
if flag is True :
df_gr_final: dataframe with differential time measurements in hours displayed horizontally (one column containing the time measurements and one column contaning the OD measurements PER SAMPLE). df_gr_final_list: list of dataframes originated from df_gr_final and split by common sample species
- tecan_od_analyzer.tecan_od_analyzer.sample_outcome(sample_file, df)¶
Uses an external file containing individual sample purposes, returns two classifed dataframes based on sample purposes and labelled by bioshaker.
- Parameters
sample_file – variable or string containing the name of the file and
extension. (its) –
df – dataframe obtained by using the read_xlsx method on the merged
file. (xlsx) –
- Returns
dataframe containing observations related to the microbial growth rate, labelled by bioshaker. df_vl: dataframe containing observations related to the volume loss estimation, labelled by bioshaker.
- Return type
df_gr
- tecan_od_analyzer.tecan_od_analyzer.stats_plot(summary_df, flag_svg=False)¶
Box plots of annotation growth rate parameters by species and bioshaker Args: summary_df : dataframe containing the annotation parameters Return: call: string with status of plots creation
- tecan_od_analyzer.tecan_od_analyzer.stats_summary(df_annotations)¶
Generates a statistics summary of the growth rate annotations.
- Parameters
df_annotations – dataframe containing growth rate annotations
- Returns
dataframe containing the summary statistics
- Return type
summary_df
- tecan_od_analyzer.tecan_od_analyzer.time_formater(df)¶
Takes a dataframe and turns date and time variables into differential time in hours for every bioshaker, returns modified dataframe.
- Parameters
df – dataframe with containing date and time measurements.
- Returns
dataframe with differential time measurements in hours.
- Return type
df_out
- tecan_od_analyzer.tecan_od_analyzer.vol_correlation(df_vl)¶
Assess the volume loss with OD450 measurements and correlates the OD450 readings to time for every different bioshaker, returns a correlation dataframe.
- Parameters
df_vl – dataframe containing only volume loss measurements.
- Returns
dataframe containing correlation values of the volume loss according to time measurements.
- Return type
cor_df
Main module.
- tecan_od_analyzer.autoflow_parser.autoflow_parser.check_results_files(entries: list, files: set, max_timeflex: int)¶
A function to check results files wrt logfile
- tecan_od_analyzer.autoflow_parser.autoflow_parser.make_channel_dfs(merged_df: pandas.core.frame.DataFrame)¶
Make dataframes with only elapsed time (h) and samples for plotting
- tecan_od_analyzer.autoflow_parser.autoflow_parser.merge_results(entry_to_file: dict, path: str)¶
Combine measurement dataframes into one
- tecan_od_analyzer.autoflow_parser.autoflow_parser.parse_tecan_files(entry: str, file: str, path: str)¶
Method to parse tecan files and return pandas dataframe
- tecan_od_analyzer.autoflow_parser.autoflow_parser.process_logfile(path: str)¶
Check logfile and return list of entries.
Console script for autoflow_parser.
- tecan_od_analyzer.cli.cli.main(path, max_timeflex)¶
Console script for autoflow_parser.