Project goal:
Analyze (or reanalyze) some of your own real data, so that it’s immediately useful to you. If you don’t have any of your own data, try and find some dataset that interests you. Failing that, I can find a dataset for you to work on.
Required:
- load source data from a file
- plot at least one histogram of the data, with title and labelled axes
- create at least one plot of analysis results, with title and labelled axes
- use at least one numpy array
- use short but descriptive variable names in your code
- document your code: use markdown in your
.ipynb
and/or directly comment your python code with #
or '''
or """
- try and be succint, while keeping the code readable
Plus do at least 6 of the following:
- use an if-elif-else clause
- use a for loop
- use a while loop
- write at least one function, include a docstring
- print out some results in at least one nicely formatted string, using string operator
%
or .format()
method
- use at least one vectorized math operation on an array
- create a figure with multiple axes (i.e., use
plt.subplots(nrows, ncols)
)
- do a statistical test - show that the test assumptions hold for your data
- manipulate and analyze data in a pandas series or dataframe
- use an image processing algorithm
- use a clustering algorithm
- use some other non-trivial algorithm: e.g. regression, curve fitting, signal analysis…
- version control your code using git: create a local repository and make at least 5 commits while developing your code
Submission:
The class project can be submitted as either:
- a python script in a
.py
file, with figures saved separately as .png
or .pdf
files, and source data included as well
- a jupyter notebook (
.ipynb
) with embedded figures, optionally submit source data as well if possible
Submit directly to Martin, either on a USB stick or via email, or post the files somewhere and send Martin the link.
Due date:
July 30, 2019