Soumendra Lahiri, who joined the Arts & Sciences faculty this fall, will be installed Jan. 27 as the Stanley A. Sawyer Professor in Mathematics and Statistics.
How did you first become interested in mathematics and statistics?
I grew up in Kolkata, India, in the West Bengal state of the country, and I enjoyed math at a relatively early age but became more sincerely interested in high school. At the end of high school, students in India take entrance exams for the universities in the country, and it's extremely competitive. My impression was that I would not be admitted, so I felt challenged to give it a shot. But to my surprise, I ended up being admitted to the highly regarded Indian Statistical Institute – one of 52 students, out of more than 10,000 applicants – and so I went. I had some really great stats courses there that grabbed my full attention for the subject and cemented my interest.
Your research is very broad. Is there an underlying theme within your research? Or any types of problems to which you're typically attracted?
Although the topics may be different – it may be in spatial statistics, or in time series, or just independent data – I think the thing that connects everything is looking for properties that arise theoretically as you let a variable, say, the size of your data set, grow to infinity. So in many different forms you're trying to come up with a simplified view when the sample size is very large. In this sense, I'd say the theme in my work is limit theorems. Also, data science problems have become a big part of my research. With those problems, you're typically trying to build algorithms or develop methods to deal with data sets. Ultimately, these algorithms can be applied to problems that are of interest to people in many different fields.
Are there any research projects or applications of your research that you've found to be particularly interesting or that have had a real-world impact?
One instance ties into weather and mobile device usage. We worked on a project where we received crowdsourced mobile data and tried to implement some novel data science techniques. The data were things like ambient temperature, humidity, and other atmospheric variables. Our goal was to learn something about ambient temperature at a very fine scale and make predictions. The classical way that temperature forecasts are made is by using ground stations. But the number of these stations is pretty sparse – they might be several tens of miles apart. But if you look at crowdsourced data, anybody who is transmitting data is potentially a new ground station. One question is, how do you combine the ground station data as well as the crowdsourced data to come up with a final temperature field? First you have to address the problem that the crowdsourced data is very unstructured and very, very noisy. So we developed a method to assign a veracity score to each data point – those with higher scores are good quality observations. Then we developed ways to combine both the ground station data and the crowdsourced data using these veracity scores. And we've been able to verify that doing this improves the prediction level.
Another recent project that I'm working on, and have really enjoyed, is developing some non-linear data science techniques for prediction models and forecasting. Often when you're trying to do prediction, you're using some time-series data – a set of observations indexed by time points. For example, if you look at stock price movement, or, again, temperature readings at a given place over time, these are time-series. If there is a correlation between things that are happening today and the observations from immediately before and immediately after, we call this a serial correlation. This kind of correlation greatly changes some of the dynamics, and there are some challenges. Most of the theory that is in place only looks at linear formulas. So what we have developed, and are continuing to work on, is a framework that can incorporate non-linear functions – things where you might be able to square or exponentiate a variable. This is something totally new, and we have very precise results about how to tune the coefficients for the prediction. We can also tell under what circumstances doing this extra work to get non-linear prediction is better than just using linear predictors.
What drew you to WashU, and how has St. Louis been for you so far?
St. Louis has been great, and I'm really enjoying the beauty of the campus and the city. My background is more theoretical, so being part of a true math department is something I thought would be a good change. I like to see other points of view, especially with respect to theory. And of course WashU has a high standard and reputation for research across all disciplines, so that was a big attraction for me. I have several ideas for interdisciplinary projects in the future.
Speaking of interdisciplinary projects, advanced statistical theory is becoming an increasingly popular tool for people across all disciplines. What advice do you have for researchers in other fields using these tools?
I think there is always a lag between what advancements are taking place in statistics and mathematics and what is being applied in other fields. I know a lot of applied scientists and engineers are aware of some of the most recent advances, but the best way to take advantage of the latest findings is to have some sort of collaboration. Instead of just doing some googling, I think it's much more effective if you can find a collaborator who is active in statistical research. They will know what is going on in the field, and it's a great way to transfer knowledge between disciplines. So find a statistician, tell them what your problems are, and maybe there will be a solution that already exists or maybe it will lead to something new.
Lastly, the average person spends about an hour and a half eating each day. What’s your favorite thing to eat? And when not eating or doing stats, what do you like to do?
Well, Benghalis have a staple diet of fish and rice, so that's my go-to. I love fresh-water fish and there seems to be a lot of good sources for that in Missouri. When I'm not eating, I enjoy walking and listening to music, both classical Indian music and Western music. I also like watching basketball and shows on Netflix. And also, I come from India, so I love cricket. Cricket fandom is like a cult.