[2005] [2006] [2007] [2008] [2009] [2010] [2011] Seminar Homepage
Speaker: Giles Hooker, Cornell University
Title: Robust and Efficient Analysis of Conditionally-Specified Models via Disparities
Abstract: Disparity based inference proceeds by minimizing a measure of disparity between a parametric family of density functions and a kernel density estimate based on observed i.i.d. data. For a class of disparity measures, of which Hellinger distance is one of the best known, minimum disparity estimates of parameters have the surprising property of being both robust to outliers and also statistically efficient.
This talk introduces three novel methods based on disparities. We develop two disparity based approaches for nonlinear regression settings, based either on a nonparametric estimate of a conditional density, or by considering the marginal distribution of residuals. Both approaches can be shown to achieve the same asymptotic variance as the MLE when the parametric model correctly describes the data. Moreover, adding an L1 penalty to the disparity objective function yields a robust version of the LASSO that retains its oracle
properties.
We also demonstrate that disparities can be used to replace log likelihoods in Bayesian inference, allowing Monte Carlo Markov Chain methods to be applied to obtain robust posterior distributions while retaining asymptotic posterior efficiency. Combining these approaches allows any component of a Bayesian hierarchical model to be made robust to outliers by replacing part of the complete data log likelihood with a disparity.
Time: 2:00 pm.
Place: M3 3127 (top)
Speaker: Anindya Bhadra, Texas A&M University
Title: Simulation-based maximum likelihood inference for partially observed Markov process models
Abstract: Estimation of static (or time constant) parameters in a general class of nonlinear, non-Gaussian, partially observed Markov process models
is an active area of research. In recent years, simulation-based
techniques have made estimation and inference feasible for these models
and have offered great flexibility to the modeler. An advantageous feature
of many of these techniques is that there is no requirement to evaluate
the state transition density of the model, which is often high-dimensional
and unavailable in closed-form. Instead, inference can proceed as long as
one is able to simulate from the state transition density - often a much
simpler problem. In this talk, we introduce a simlation-based maximum
likelihood inference technique known as iterated filtering that uses an
underlying sequential Monte Carlo (SMC) filter. We discuss some key
theoretical properties of iterated filtering. In particular, we prove the
convergence of the method and establish connections between iterated
filtering and well-known stochastic approximation methods. We then use the
iterated filtering technique to estimate parameters in a nonlinear,
non-Gaussian mechanistic model of malaria transmission and answer
scientific questions regarding the effect of climate factors on malaria
epidemics in Northwest India. Motivated by the challenges encountered in
modeling the malaria data, we conclude by proposing an improvement
technique for SMC filters used in an off-line, iterative setting.
Time: 3:00 pm.
Place: M3 3127 (top)
Speaker:Hedibert Lopes, University of Chicago
Title: Cholesky Stochastic Volatility
Abstract: Multivariate volatility has many important applications in finance, including asset allocation and risk management. Estimating multivariate volatility, however, is not straightforward because of two major difficulties. The first difficulty is the curse of dimensionality. For p assets, there are p(p+1)/2 volatility and cross-correlation series. In addition, the commonly used volatility models often have many parameters, making them impractical for real application. The second difficulty is that the conditional covariance matrix must be positive definite for all time points. This is not easy to maintain when the dimension is high. In this paper, we develop a new approach to modeling multivariate volatility. We name our approach Cholesky Stochastic Volatility (CSV). Our approach is Bayesian and we carefully derive the prior distributions with an appealing practical flavor that allows us to search for simplifying structure without placing hard restrictions on our model space. We illustrate our approach by a number of real and synthetic examples, including a real application based twenty of the S&P100 components.
Time: 2:00 pm.
Place: M3 3127 (top)
Speaker: Lysa Porth, University of Waterloo
Title: A Reinsurance Portfolio Optimization Approach for Crop Insurance
Abstract: Some insurance firms are faced with the unique challenge of managing risks that are large, infrequent, and potentially highly correlated within geographic regions or across product lines. An example of this is crop insurance, which includes weather risk correlated within regions, and leads to a portfolio of risks with high variance. A solution to this problem is undertaken in this study, using a combination of self managed reinsurance pooling and private reinsurance in a portfolio approach that utilizes combinatorial optimization with a genetic algorithm (Model C). This approach takes advantage of the natural offsetting of risks across regions in order to reduce risk in a cost effective manner. In addition, the reinsurance pool is supplemented with private reinsurance for a select group of correlated risks that do not sufficiently offset, adding further diversification.
An asset liability management (ALM) portfolio approach is used to examine the entire crop insurance sector for Canada. This is the first study to focus on reinsurance pooling for an entire insurance sector in a country, and it uses all major crops from 1978-2009, across 10 regions (provinces). The ALM evaluation shows that using a combination of reinsurance pooling and private reinsurance in a portfolio approach, Model C, is an innovative risk management approach for industry to efficiently reduce reinsurance costs without incurring additional risk. Beyond crop insurance, the portfolio model developed in this study would also be useful for other large natural disaster and weather related insurance portfolios, and other portfolio applications.
Time: 4:00 pm.
Place: M3 3127 (top)
Speaker: Martin Lysy, Harvard University
Title: Building a Tall Bridge Over a Wide River: Batch Filtered Inference for Multivariate Diffusions
Abstract: Diffusion processes have been used to model a variety of continuous-time phenomena in Finance, Engineering, and the Natural Sciences. However, parametric inference has long been complicated by an intractable likelihood function. For many models, the most effective solution involves a large amount of missing data for which the typical Gibbs sampler can be arbitrarily slow. On the other hand, joint parameter and missing data proposals can lead to a radical improvement, but their acceptance rate tends to scale exponentially with the number of observations.
We consider here a novel method of dividing the inference process into separate data batches, each small enough to benefit from joint proposals. A filter bridges batch contributions for inference on the whole dataset, or river. We present an example using Heston's stochastic volatility model for financial assets, but much of the methodology extends to Hidden Markov and other State-Space models.
Time: 2:00 pm.
Place: M3 3127 (top)
Speaker: Ioana Cosma, University of Cambridge
Title: Data sketching for cardinality and entropy estimation
Abstract:Streaming data is ubiquitous in a wide range of areas from engineering and information
technology, finance, and commerce, to atmospheric physics, and earth sciences. The online
approximation of characterizations of data streams is of great interest, but this approximation
process is hindered by the sheer size of the data and the speed at which it is generated.
Probabilistic algorithms for processing streaming data construct and maintain sub-linear
representations, in a single pass over the data, from which target characterizations can be
inferred with high efficiency.
In this talk we consider the online approximation of two important characterizations
of data streams: cardinality and empirical Shannon entropy. We assume that the number
of distinct elements observed in the stream is prohibitively large, so that the vector of
cumulative quantities cannot be stored on main computer memory for fast and efficient
access. This talk will focus on two techniques that use pseudo-random variates to form lowdimensional
data sketches, using the methods of hashing, seeding, and random projections
to the stable distribution. We discuss various properties of our estimators such as relative
asymptotic efficiency, recursive computability, and error and complexity bounds. Finally,
we present results on simulated data and on real data sets from the Soufri`ere Hills volcano
located on the Carribean island of Montserrat.
Time: 2:00 pm.
Place: M3 3127 (top)
Speaker: Helene Gehrmann, York University
Title:TBA
Abstract: TBA
Time: 4:00 pm.
Place: M3 3127 (top)
Speaker: Jun Li, Stanford University
Title: Differential Expression Identification and False Discovery Rate Estimation in RNA-Seq Data
Abstract: RNA-Sequencing (RNA-Seq) is taking place of microarrays and becoming the primary tool for measuring genome-wide transcript expression. We discuss the identification of features (genes, isoforms, exons, etc.) that are associated with an outcome in RNA-Seq and other sequencing-based comparative genomic experiments. That is, we aim to find features that are differentially expressed in samples in different biological conditions or under different disease statuses. RNA-Seq data take the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or sequencing depths. Existing methods for this problem are based on Poisson or negative-binomial models: they are useful but can be heavily influenced by outliers in the data. We introduce a simple, non-parametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class, or multiple-class outcomes. We compare our proposed method to Poisson and negative-binomial based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods.
Time: 2:00 pm.
Place: M3 3127 (top)
Speaker: KunLiang, University of Wisconsin-Madison
Title: Adaptive procedures for false discovery rate estimation and control
Abstract: Multiple testing has generated a surging interest in recent years due to the wide availability of large and complex modern data sets. Much research focused on the false discovery rate (FDR) estimation and control, and adaptive procedures have particularly attracted growing attention. By incorporating good estimates of the proportion of true null hypotheses among all hypotheses, adaptive procedures have been shown to increase the power of detecting non-null hypotheses while maintaining the FDR. Most existing adaptive procedures rely on tuning parameters, which can be either assigned a priori (fixed) or estimated from data (dynamically). In this talk, I will first provide a finite sample proof of conservative point estimation for fixed adaptive FDR procedures. Then, I will present a general condition under which dynamic adaptive procedures can lead to conservative null proportion and FDR estimators. In addition, I will derive asymptotic results on FDR estimation and control for a class of dynamic adaptive procedures under some weak dependence condition. Applications of the FDR to high-throughput genomic data will be discussed at the end.
Time: 2:00 pm.
Place: M3 3127 (top)
Speaker: Sophia Su, University of Minnesota
Title: Envelope Models and Methods
Abstract: This talk presents a new statistical concept called an envelope. An envelope has the potential to achieve substantial efficiency gains in multivariate analysis by identifying and cleaning up immaterial information in the data. The efficiency gains will be demonstrated both by theory and example. Some recent developments in this area, including partial envelopes and heteroscedastic envelopes, will also be discussed. They refine and extend the enveloping idea, adapting it to more data types and increasing the potential to achieve efficiency gains. Applications of envelopes and their connection to other fields will also be mentioned.
Time: 3:00 pm.
Place: M3 3127 (top)
Speaker: Ruodu Wang, Georga Institute of Technology
Title: Two topics on the multivariate dependence structure
Abstract: This talk will cover two topics related to the multivariate dependence structure. 1) Completely mixable (CM) distributions: the marginal distribution of identically distributed random variables having a constant sum is called a CM distribution. The concept and present research of the CM distributions will be introduced. The new technique can be used to solve existing problems of the Frechet class, including important questions in risk management such as finding the bounds on the variance, stop-loss premium or Value-at-Risk for the total risk with an unknown dependence structure among individual risks. 2) Testing high-dimensional (HD) covariance: testing covariance is of importance in many areas of statistical analysis, as well as in financial and actuarial practice. In general the testing problems become difficult when the dimension of the problem grows as the sample size increases. To overcome the difficulty brought by the HD phenomena, I will introduce an empirical likelihood method to test the covariance matrix by splitting the data into two groups. The asymptotic distribution of the new test statistic is independent of the dimension. The new method can be used to deal with two problems: testing precise covariance matrix and testing the banded structure of the covariance matrix.
Time: 10:30 am.
Place: M3 3127 (top)
Speaker: Chad Q, University of North Carolina
Title: Sparse Meta-Analysis With Applications to High-Dimensional Data
Abstract: Meta-analysis plays an important role in synthesizing scientific evidence from multiple studies. When the dimensions of the data are high, as in many genomic studies, it is desirable to incorporate variable selection into meta-analysis to improve model interpretation and prediction. Existing variable selection methods require direct access to raw data, but in practice, it is often difficult to collect the raw data from multiple genomic studies. We propose a new approach, Sparse Meta-Analysis (SMA), which performs variable selection for meta-analysis based solely on summary statistics and allows the effect sizes of each covariate to vary among studies. The selection consistency and asymptotic normality of the estimators are established. The empirical performance is assessed through simulation studies. An application to a set of genome-wide association studies is provided.
Time: 4:00 pm
Place: M3 3127
Speaker: Reza Ramezam, University of Waterloo
Title: On Statistics Anxiety: A Transactional Analysis Approach
Abstract:Transactional Analysis (TA), founded by Eric Berne in late 50s, is a psychological approach in studying interactions between individuals. This talk will provide a brief overview of how TA can be applied in teaching statistics courses to reduce statistics anxiety experienced by many students. It will be delivered in three sections:
1) A brief introduction to Transactional Analysis, including the key concepts and main ideas surrounding the approach,
2) The common categories of instructors and students from Transactional Analysis point of view, particularly, the interactions between the instructor and student,
3) Discussing the Statistics Anxiety Rating Scale (STARS) and how TA can be used to deal with statistics anxiety.
The material in the first part of the presentation can be used to understand the dynamics of interactions between individuals in everyday life. Part two can be of interest to anyone who teaches at the university level, and part three focuses on teaching statistics courses.
Time: 3:00 pm
Place: M3 3127
Speaker: David Matthews, University of Waterloo
Title: Exact Nonparametric Confidence Bands for the Survivor Function
Abstract:A method to produce exact simultaneous confidence bands for the empirical cdf that was first described by Owen, and subsequently corrected by Jager and Wellner, is the starting point for deriving exact nonparametric confidence bands for the survivor function, of any positive random variable. We invert a nonparametric likelihood test of uniformity, constructed from the Kaplan-Meier estimator of the survivor function, to obtain simultaneous lower and upper bands for the functional of interest with a specified global confidence level. The method involves calculating a null distribution and associated critical value for each observed sample configuration. However, Noe's recursions and the Van Wijngaarden-Decker-Brent root-finding algorithm provide the necessary tools for efficient computation of these exact bounds. Various aspects of the effect of right censoring on these exact bands are investigated, using as illustrations two observational studies of survival experience among non-Hodgkin's lymphoma patients and a much larger group of subjects with advanced lung cancer enrolled in trials within the North Central Cancer Treatment Group.
Time: 4:00 pm
Place: M3 3127
Speaker: Phelim Boyle, University of Waterloo
Title: Madoff's Ponzi Scheme
Abstract: For many years Bernie Madoff delivered consistently high returns with amazingly low volatility. In December 2008 his operation was exposed as a giant
Ponzi scheme. It is estimated that the account value of his funds then was
sixtyfive billion dollars. The Madoff scandal raises many questions. How did
he perpetrate such a massive fraud and fool so many people for so long? Why
did regulation and due diligence fail so miserably in one of the world's most
sophisticated capital markets? How did he select his fictitious returns? Why
did some investors receive more favorable returns than others? Are there any
simple tests that could have been used to check if his alleged returns were
too good to be true? Legal proceedings are currently under way to recover
some of the funds lost by his investors. This talk will discuss these and other
issues.
Time: 4:00 pm
Place: M3 3127
Speaker: Yingli Qin, Nanyang Technological University, Singapore
Title: Hypothesis testing for high-dimensional data
Abstract: This talk will cover two hypothesis testing problems in high-dimension framework. First, we propose a high-dimensional one-way MANOVA under a general linear model assumption. We establish the asymptotic normality of the proposed test statistic. The finite sample performance of the proposed test statistic is illustrated through simulation studies. An application to a well-known microarray data set is carried out to identify differentially expressed gene sets across multiple cancer subtypes. The second problem is to test whether two high-dimensional distributions are identical. We propose a test statistic based on empirical distribution functions, and its theoretical and finite sample simulation results will be presented.
Time: 3:00 pm
Place: M3 3127
Speaker: Bruce L. Jones, University of Western Ontario
Title: Credibility for Pension Plan Terminations
Abstract: In establishing demographic assumptions for pension plan
calculations, pension actuaries must decide on suitable
termination rates. These rates typically depend on age
and years of service, but may also depend on other factors
such as economic conditions.
Restricting our attention to terminations other than mortality, disability or retirement (i.e. resignations and firings), we investigate an approach to adjusting a standard termination table to reflect the experience of the plan and other variables. Actual to expected ratios are modeled using a generalized linear model, and a limited fluctuation approach is used to reflect the credibility of the plan experience.
Time: 4:00 pm
Place: M3 3127
Speaker: Qihe Tang, University of Iowa
Title: The Double-barrier Default of a Time-homogeneous Diffusion Model
Abstract: For two exogenously determined barriers a<b, a firm is considered as defaulted whenever its value either goes below level a or continuously stays below level b for c>0 units of time. Economic justifications for this concept of default are the US bankruptcy codes Chapter 7 (Liquidation) and Chapter 11 (Reorganization). We model the firm value by a time-homogeneous diffusion model and derive an explicit formula for the non-default probability. This talk is based on a joint work with Bin Li and Xiaowen Zhou.
Time: 4:00 pm
Place: M3 3127
Speaker: Emiliano A. Valdez, University of Connecticut
Title: Longitudinal Modeling of Insurance Claim Counts Using Jitters
Abstract: Modeling insurance claim counts is a critical component in the ratemaking process for property and casualty insurance. This article explores the usefulness of copulas to model the number of insurance claims for an individual policyholder within a longitudinal context. To address the limitations of copulas commonly attributed to multivariate discrete data, we adopt a "jittering" method to the claim counts which has the effect of continuitizing the data. Elliptical copulas are proposed to accommodate the intertemporal nature of the ``jittered" claim counts and the unobservable subject-specific heterogeneity on the frequency of claims. Observable subject-specific effects are accounted in the model by using available covariate information through a regression model. The predictive distribution together with the corresponding credibility of claim frequency can be derived from the model for ratemaking and risk classification purposes. For empirical illustration, we analyze an unbalanced longitudinal dataset of claim counts observed from a portfolio of automobile insurance policies of a general insurer in Singapore. We further establish the validity of the calibrated copula model, and demonstrate that the copula with ''jittering'' method outperforms standard count regression models. (This is joint work with Dr. Peng Shi of Northern Illinois University, USA.)
Time: 2:00 pm
Place: M3 3127
Speaker: Lei Sun, University of Toronto
Title: Incorporating Prior Information to Multiple Hypothesis Testing with Applications to Large-Scale Genetic Studies
Abstract: A central issue in high-dimensional genetic studies is how to assess statistical significance taking into account the inherent large-scale multiple hypothesis testing. To improve power, a number of studies have investigated the benefits of utilizing available prior information, however, the relative merits of different methods remain unknown. We focus on the stratified FDR control (Sun et al., 2006; Yoo et al., 2010) and weighted p-value method (Genovese et al., 2006; Roeder et al., 2006). The two approaches model the prior distinctively. Weighted FDR converts the available prior information to test-specific weighting factor and adjusts the p-values accordingly. In contrast, stratified FDR divides tests into several disjoint strata based on the prior information and applies the FDR control separately in each stratum. We formulate the two approaches in one framework and show the trade-off between power and robustness by theoretical, simulation, and application studies. Robustness is consequential in applications, safeguarding against potential uninformative or even misleading prior information. To demonstrate the practical relevance of these methods, I will discuss two recent genome-wide association studies of Cystic Fibrosis modifier genes, in which over 500,000 genetic markers are investigated for lung functions in CF patients and the available prior is of quantitative nature (Wright et al. 2011, Nature Genetics 43:539-548), and for meconium ileus and the prior is of categorical nature (Sun et al. 2012, Nature Genetics, in press).
Time: 4:00 pm
Place: M3 3127
Speaker: Ming Lee, University of Waterloo
Title: Information Distance from a Question to an Answer
Abstract: The data and knowledge from the internet have helped us to build a cross language question answering engine RSVP. We were not just interested in building the system but we wanted to study and justify a fundamental theory of information, naturally enabled by such big data. In this lecture we will explain the theory of information distance and how this theory can help us to compute the distance from a query to an answer candidate, to improve speech recognition, to property classify queries, and to cross language translation of queries.
Time: 3:30 pm
Place: DC 1302
Speaker: Douglas Schaubel, University of Michigan
Title: Estimating the average treatment effect on mean survival time when treatment is time-dependent and censoring is dependent
Abstract: We propose methods for estimating the average difference in restricted mean survival time attributable to a time-dependent treatment. In the data structure of interest, the time until treatment is received and the pre-treatment death hazard are both heavily influenced by a longitudinal process. In addition, subjects may experience periods of treatment ineligibility. The pre-treatment death hazard is modeled using inverse weighted partly conditional methods, while the post-treatment hazard is handled through Cox regression. Subject-specific differences in pre- versus post-treatment survival are estimated, then averaged in order to estimate the average treatment effect among the treated. Asymtptotic properties of the proposed estimators are derived and evaluated in finite samples through simulation. The proposed methods are applied to liver failure data obtained from a national organ transplant registry.
Time: 4:00 pm
Place: M3 3127
Speaker: Ian McLeod, University of Western Ontario
Title: TBA
Abstract: TBA
Time: 4:00pm
Place: M3 3127
Speaker: Juli Atherton, McGill University
Title:
Abstract: TBA
Time: 4:00 pm
Place: M3 3127
Speaker: Ricardas Zitikis, University of Western Ontario
Title: TBA
Abstract: TBA
Time: 4:00 pm
Place: M3 3127
Speaker: John Boland, University of South Australia
Title: Reconciling rainfall modelling on differing time scales
Abstract: Rainfall models perform well for the time scales on which they are produced but for other time scales, a model's performance can be erratic. Previous experience in estimating five minute variance of wind farm output derived from modelling ten second output series (Agrawal et al 2010) has led us to be able to perform a similar exercise with modelling daily rainfall series in a fashion that preserves the monthly variance. We develop an AR(3) model for daily rainfall (after multiplicative deseasoning), and through standard transformations for stationary time series, develop estimates for variances for monthly rainfall series that match the observed monthly variances. Extensions to other climate variables will also be discussed.
Time: 4:00 pm
Place: M3 3127
Speaker: Jing Ai, University of Hawaii at Manoa
Title: TBA
Abstract: TBA
Time: 4:00 pm
Place: M3 3127
Speaker: Frank Konietschke, University of Goettingen
Title: TBA
Abstract: TBA
Time: 4:00 pm
Place: M3 3127
Speaker: Ping Yan
Title: Some stochastic processes as infectious disease models
Abstract: This presentation attempts to synthesize stochastic and deterministic disease transmission models and put them in statistical context, such as parameter estimation and prediction. It starts with brief reviews of the univariate and bi-variate birth-death Markov processes used as SIS and SIR models (Allen, 2011, second edition) and compare them against their deterministic counterparts. When the population is finite, the deterministic models are ordinary differential equation models that approximate the stochastic mean values of the birth-death processes, as an extreme case of the moment closure framework by setting the second order moments to zero. As the population size approaches infinity, the deterministic models can be also regarded as approximations to the ODE models for the stochastic mean values. Continuing, the presentation discusses the contexts under which the stochastic models do and do not “average out” to their deterministic counterparts in infinitely large populations. These further lead to the statistical implication for estimating parameters in disease models, where data, even in the perfect situation of continuous and complete observation, tend to arise from a single realization of the sample paths of an inherently stochastic event. The final part of the presentation will discuss the notion of “heterogeneity” in disease models and the proposal of formulating such a concept in terms of stochastic orders, along with some application results from recent publications in the study of the initial growth and evaluation of effectives of certain infectious disease control measures.
Time: 4:00 pm
Place: M3 3127