.. _label-Examples-DataSample: Data sample modeling ===================================== Consider the Canadian temperature dataset in the figure below. These data are from :ref:`Ramsay & Silverman (2005) ` and are available for download here: | http://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/Matlab/ | The data are in **fdaMatlab.zip** in the **/examples/weather** folder. .. _label-WeatherData: .. plot:: import numpy as np import matplotlib.pyplot as plt import power1d #(0) Load weather data: data = power1d.data.weather() y0 = data['Arctic'] y1 = data['Atlantic'] y2 = data['Continental'] y3 = data['Pacific'] Y = [y0, y1, y2, y3] #(1) Plot: plt.figure(figsize=(10,4)) ax0 = plt.axes([0.09,0.18,0.41,0.8]) ax1 = plt.axes([0.57,0.18,0.41,0.8]) AX = [ax0,ax1] labels = 'Arctic', 'Atlantic', 'Continental', 'Pacific' colors = 'b', 'orange', 'g', 'r' x = np.arange(365) for y,label,c in zip(Y,labels,colors): h = ax0.plot(y.T, color=c, lw=0.5) plt.setp(h[0], label=label) ### plot mean and SD continua: m,s = y.mean(axis=0), y.std(ddof=1, axis=0) ax1.plot(m, color=c, lw=3, label=label) ax1.fill_between(x, m-s, m+s, alpha=0.5) plt.setp(AX, ylim=(-35,25)) ### legend: ax1.legend(fontsize=10, loc=(0.35,0.03)) [ax.set_xlabel('Day', size=14) for ax in AX] ax0.set_ylabel('Temperature (deg C)', size=14) ### panel labels: labels = '(a)', '(b)' [ax.text(0.03, 0.91, s, size=12, transform=ax.transAxes) for ax,s in zip(AX,labels)] The first step of power analysis is to create a model the data that can be used to generate real numerical data samples. Later those data samples will be iteratively generated over a large number of iterations in order to numerically accumulate probability distributions associated with the model. In **power1d** there are four steps to creating a data sample model: 1. Create baseline geometry. 2. Create signal geometry. 3. Create a noise model. 4. Combine 1--3 in a DataSample model. The first step is to create "baseline" one-dimensional (1D) geometry. If one has experimental data like those depicted above, it is easy to generate that baseline as a **power1d.geom.Continuum1D** as follows: .. plot:: :include-source: import power1d data = power1d.data.weather() # load data y = data['Continental'] # extract one region m = y.mean( axis=0 ) # mean continuum baseline = power1d.geom.Continuum1D( m ) baseline.plot() If we didn't have experimental data we could approximate the baseline using the geometric models in :ref:`label-API-geom`. In this case one possibility for modeling the baseline is to subtract a Constant value from a GaussianPulse as indicated below. This appears to create a reasonable approximation to the experimental data. .. plot:: :include-source: import matplotlib.pyplot as plt import power1d data = power1d.data.weather() # load data y = data['Continental'] # extract one region m = y.mean( axis=0 ) # mean continuum # approximate the experimental value using a Gaussian pulse: Q = 365 # continuum size g0 = power1d.geom.GaussianPulse( Q , q=200 , fwhm=190 , amp=40 ) g1 = power1d.geom.Constant( Q , amp=23 ) baseline = g0 - g1 # subtract the geometries ax = plt.axes() ax.plot( m , label='Experimental mean', color='k') baseline.plot( ax=ax, color='g', linewidth=3, label='Model' ) ax.legend() The second step is to create a signal that we wish to detect experimentally. Let's say that we hope to detect a broad increase in temperature over the summer months with a maximum temperature increase of 5 deg. That could be modeled as a second Gaussian pulse like this: .. plot:: :include-source: import matplotlib.pyplot as plt import power1d data = power1d.data.weather() # load data y = data['Continental'] # extract one region m = y.mean( axis=0 ) # mean continuum # approximate the experimental value using a Gaussian pulse: Q = 365 # continuum size g0 = power1d.geom.GaussianPulse( Q , q=200 , fwhm=190 , amp=40 ) g1 = power1d.geom.Constant( Q , amp=23 ) baseline = g0 - g1 # subtract the geometries # add a signal: signal = power1d.geom.GaussianPulse( Q , q=200 , fwhm=100 , amp=5 ) ax = plt.axes() ax.plot( m , label='Experimental mean', color='k') baseline.plot( ax=ax, color='g', linewidth=3, label='Model' ) (baseline + signal).plot( ax=ax, color='r', linewidth=3, label='Signal' ) ax.legend() The third step is to create a model of the noise we expect to observe experimentally. From the :ref:`weather data ` above we can see at least two types of noise: a high-frequency day-to-day noise and lower-frequency noise across the different 1D observations. In **power1d** a variety of :ref:`noise models ` are available to model theses processes. Here let's model the noise as a combination of uncorrelated Gaussian noise and SmoothGaussian noise: .. plot:: :include-source: import matplotlib.pyplot as plt import power1d J = 8 # sample size Q = 365 # continuum size noise0 = power1d.noise.Gaussian( J , Q , mu = 0 , sigma = 0.3 ) noise1 = power1d.noise.SmoothGaussian( J , Q , mu = 0 , sigma = 3 , fwhm = 70 ) noise = power1d.noise.Additive( noise0 , noise1 ) fig,axs = plt.subplots(2, 2, tight_layout=True) axs[1,1].set_visible( False ) # plot noise objects: noise0.plot( ax=axs[0,0] ) noise1.plot( ax=axs[0,1] ) noise.plot( ax=axs[1,0] ) plt.setp(axs, ylim=(-9, 9)) # add panel labels: labels = 'Gaussian', 'SmoothGaussian', 'Additive' for ax,s in zip( axs.ravel() , labels ): ax.text(0.05, 0.9, s, transform=ax.transAxes, bbox=dict(facecolor='w')) Before we add that noise to our model let's consider these noise models briefly. Each noise model has an attribute *value* which contains a (J x Q) array representing the current state of the random noise. New random noise can be generated using the *random* method, and the noise values can be replicated using NumPy's **np.random.seed** function as follows: .. plot:: :include-source: import numpy as np import matplotlib.pyplot as plt import power1d J = 3 Q = 5 fig,axs = plt.subplots(2, 2, tight_layout=True) axs[1,1].set_visible(False) # (a) generate noise object: np.random.seed(12345) noise = power1d.noise.Gaussian(J, Q, mu=0, sigma=1) noise.plot( ax=axs[0,0] ) # (b) create new random data: noise.random() noise.plot( ax=axs[0,1] ) # (c) reset noise to its original state: np.random.seed(12345) noise.random() noise.plot( ax=axs[1,0] ) # add panel labels: labels = '(a) Original noise', '(b) New noise', '(c) Reset using np.random.seed' for ax,s in zip( axs.ravel() , labels ): ax.text(0.05, 0.9, s, transform=ax.transAxes, bbox=dict(facecolor='w')) This noise control is identical for all types of :ref:`noise models ` including compound types like the Additive model above. The fourth and final step is to combine our baseline, signal and noise objects into a DataSample object like this: .. plot:: :include-source: import numpy as np import matplotlib.pyplot as plt import power1d J = 8 # sample size Q = 365 # continuum size # construct baseline geometry: g0 = power1d.geom.GaussianPulse( Q , q=200 , fwhm=190 , amp=40 ) g1 = power1d.geom.Constant( Q , amp=23 ) baseline = g0 - g1 # subtract the geometries # construct signal geometry: signal = power1d.geom.GaussianPulse( Q , q=200 , fwhm=100 , amp=5 ) # construct noise model: noise0 = power1d.noise.Gaussian( J , Q , mu = 0 , sigma = 0.3 ) noise1 = power1d.noise.SmoothGaussian( J , Q , mu = 0 , sigma = 3 , fwhm = 70 ) noise = power1d.noise.Additive( noise0 , noise1 ) # create data sample model: model = power1d.models.DataSample( baseline, signal, noise, J=J ) # visualize fig,axs = plt.subplots(2, 2, tight_layout=True) axs[1,1].set_visible( False ) np.random.seed(12345) model.random() model.plot( ax=axs[0,0] ) model.random() model.plot( ax=axs[0,1] ) np.random.seed(12345) model.random() model.plot( ax=axs[1,0] ) plt.setp(axs, ylim=(-25, 40)) # add panel labels: labels = '(a) Original model state', '(b) New state', '(c) Reset using np.random.seed' for ax,s in zip( axs.ravel() , labels ): ax.text(0.05, 0.9, s, transform=ax.transAxes, bbox=dict(facecolor='w')) In the figure above we see that, like the noise models, we can generate new data sample models using the *random* method and we can control the model's random state using **np.random.seed**. This instantaneous state of a DataModel represents the type of data we expect to observe in a single experiment **if** the baseline, signal and noise models are all accurate. If they are accurate we can numerically generate a large number of experimental datasets as if we had conducted a large number of experiments. Moreover, and most importantly, we can compare one model to another. Specifically, by comparing a model with no (or Null) signal to one with a modeled (non-null) signal we can calculate the power with which we will be able to detect that signal given our noise model. This type of power analyis is described in :ref:`the next section `.