University to lead Midwest Big Data Hub

By Emily Scott

Every day, vast amounts of data are created, with sources ranging from cell phones and social media to scientific research. But because more data exists now more than ever, there are more problems arising when it comes to storing, accessing and sharing this data.

To address this problem, the National Science Foundation announced on Nov. 2 the creation of four Big Data Regional Innovation Hubs.EJ These hubs — the Northeast Hub, South Hub, Midwest Hub and West Hub — are a part of the White House’s “Big Data Research and Development Initiative,” which aims to improve the field of data science by implementing new strategies for managing big data.EJ

The University of Illinois at Urbana-Champaign will lead the Midwest Big Data Hub, which will cover twelve states.EJ The University of Michigan at Ann Arbor, the University of North Dakota, Iowa State University and Indiana University will also be joining the hub as partners. EJNumerous companies and institutions have also agreed to participate in the hub.

The Midwest Big Data Hub will provide organizational structures to establish effective sharing of data. The aim is to increase sharing and accessibility so that big data may be distributed effectively, prompting new data infrastructures, collaborations and research findings.EJ

Edward Seidel, director of the National Center for Supercomputing Applications (NCSA), helped coordinate the University’s proposal to the lead the Midwest Hub.EJ He is serving as the principal investigator for the Midwest Hub and the interim chair for its steering council.EJ He said he sees the NCSA’s involvement as an opportunity that is “of strategic importance” for NCSA’s mission.

    Sign up for our newsletter!

    “NSF did something very interesting, they realized that there needs to be some organizational structures for communities to come together,” Seidel said. “Just the conception of how these organizational structures will be built, how they would be regional . . . that was all quite novel.”

    The organizational structure behind the hubs will help researchers share data to benefit their own work and solve bigger problems. Kandace Turner, project manager for NCSA, said the biggest problem when it comes to data is the transferring of data.EJ

    “There’s more data now than there’s ever been, and it’s being produced much faster than it’s ever been produced,” Turner said. “And it’s not just research data. So (people) realize that there’s more data available, so that means that there’s more to learn, and more things that could possibly be solved. This was a great opportunity to move in that direction.”

    Turner said that often, researchers at one institution, such as a university, have access to a data set that other institutions could benefit greatly from, but there’s no way to transfer the data.

    Turner said the Midwest Hub will build “collaborations that maybe otherwise would not be built” so researchers from institutions across the Midwest will have access to resources, data sets, and people that they can collaborate with to which they previously did not have access. The Midwest Hub will serve as a facilitator for these collaborations.

    “I think the biggest thing is that the University is really participating in sort of a ground-breaking movement,” Turner said. “It’s really been important. This whole connection of data resources has been missing.”

    The problem of a disconnect between data and researchers is a problem known firsthand by Klara Nahrstedt, professor of computer science and director of the Coordinated Science Laboratory at the University. Alongside Seidel, Nahrstedt was leading the proposal process for the University to lead the Midwest Hub.EJ

    Nahrstedt has always been interested in problems surrounding the sharing of data, and the difficulties that researchers — including she and her colleagues — face in getting access to data. She explained how, for example, as a data scientist, getting access to tremendous amounts of data from big companies, such as AT&T or Caterpillar, is nearly impossible.

    She said there are typically two types of data: large amounts, such as from a telescope, that are received at one time, or small amounts of data that are received frequently, and thus accumulate into larger data sets over time.

    “We are not really doing a very good job on understanding, as a community, how to share data,” she said. But it’s not only understanding how to share the data, it’s about how to license and price it as well, according to Nahrstedt.

    Nahrstedt said the organizational structure of the Midwest Hub will help to provide a data infrastructure that was previously missing. Included in this structure are the different spokes, or concentration areas, that the Midwest Hub will be focusing on that are specific to the region.EJ

    Some of the spokes for the Midwest Hub are digital agriculture, smart cities and communities and food-water-energy — which Nahrstedt will serve as the lead for in connection to her involvement in related research.

    Nahrstedt said she anticipates that the collaborations within the Midwest Hub will create many new findings.

    “I envision in the computing and network infrastructure, there’s going to be a lot of changes,” she said. She expects that new algorithm and systems designs will be “booming” as a result of the new methods for managing data.

    With the increased sharing of data sets, Nahrstedt also hopes to see new impacts on research. If two research areas have access to each other’s data — for example, water and energy — she said they will be able to see correlations more quickly that will allow researchers to see trends that they may not have known existed.

    “If you allow people to suddenly look at the correlations, people will much faster find out — what’s the impact of my action?” Nahrstedt said. “If we could create these kinds of interdependencies, I think we could maybe have a much better understanding of ‘what if’ scenarios.”

    With this understanding, Nahrstedt said reaction time to findings can be increased significantly.

    “Instead of reacting on a year’s basis, we can react on a monthly, daily basis,” she said. “We can see trends much finer . . . therefore saving lives, or improving the quality of life.”

    Making an impact on real-world problems is exactly what the Midwest Hub aims to achieve, and Seidel said the application areas are countless. He said the NCSA’s expertise in computing and software will help communities come together around “grand challenge problems,” relating to fields such as precision agriculture, medicine, manufacturing and much more.

    “We’re very excited about launching into this endeavor,” Seidel said. “I’m hopeful and confident that it will be successful and have a lot of impact in the communities and in the country.”

    [email protected]