Menu

Supporting Data-intensive Social Science

Yale's DISSC simplifies big data research by automating pipelines, enabling secure access, and optimizing massive datasets so researchers can focus on discovery, not infrastructure.

The Data-Intensive Social Science Center (DISSC) serves as a hub for social scientists at Yale who work with big data and secure datasets. The center provides support throughout the research lifecycle—assisting researchers in identifying appropriate data sources, establishing secure computing environments, and navigating applications for restricted data access.

A core aspect of DISSC’s mission is to ensure that large datasets are structured in ways that enable researchers to work with them efficiently. One notable example is the L2 voter database, which arrives as over 7 terabytes of raw data.

DISSC collaborated with the Yale Center for Research Computing (YCRC) to develop automated Nextflow pipelines that transform this massive raw dataset into optimized columnar formats, significantly improving query performance. Additionally, an Apache Spark cluster was implemented to allow horizontal scaling of processing power, which is essential when handling datasets of this magnitude.

In another pilot initiative, DISSC explored cloud-based querying for large-scale datasets. Working again with YCRC, the team established an ODBC connection that enables researchers to run queries against cloud-hosted datasets and retrieve results for local analysis at YCRC.

DISSC’s overarching goal is clear: to allow researchers to concentrate on their research rather than on the complexities of data infrastructure.

Yale Center for Research Computing

Research projects

A Future of Unmanned Aerial Vehicles
Yale Budget Lab
Volcanic Eruptions Impact on Stratospheric Chemistry & Ozone
Towards a Whole Brain Cellular Atlas
Tornado Path Detection
The Kempner Institute - Unlocking Intelligence
The Institute for Experiential AI
Taming the Energy Appetite of AI Models
Surface Behavior
Studying Highly Efficient Biological Solar Energy Systems
Software for Unreliable Quantum Computers
Simulating Large Biomolecular Assemblies
SEQer - Sequence Evaluation in Realtime
Revolutionizing Materials Design with Computational Modeling
Remote Sensing of Earth Systems
Quantum Computing in Renewable Energy Development
Pulling Back the Quantum Curtain on ‘Weyl Fermions’
New Insights on Binary Black Holes
NeuraChip
Network Attached FPGAs in the OCT
Monte Carlo eXtreme (MCX) - a Physically-Accurate Photon Simulator
Modeling Hydrogels and Elastomers
Modeling Breast Cancer Spread
Investigating Mantle Flow Through Analyses of Earthquake Wave Propagation
Impact of Marine Heatwaves on Coral Diversity
IceCube: Hunting Neutrinos
Genome Forecasting
Global Consequences of Warming-Induced Arctic River Changes
Exact Gravitational Lensing by Rotating Black Holes
Evolution of Viral Infectious Disease
Evaluating Health Benefits of Stricter US Air Quality Standards
Ephemeral Stream Water Contributions to US Drainage Networks
Energy Transport and Ultrafast Spectroscopy Lab
Electron Heating in Kinetic-Alfvén-Wave Turbulence
Discovering Evolution’s Master Switches
Dexterous Robotic Hands
Developing Advanced Materials for a Sustainable Energy Future
Detecting Protein Concentrations in Assays
Denser Environments Cultivate Larger Galaxies
Deciphering Alzheimer's Disease
Dancing Frog Genomes
Cyber-Physical Communication Network Security
Avoiding Smash Hits
Analyzing the Gut Microbiome
Adaptive Deep Learning Systems Towards Edge Intelligence
Accelerating Rendering Power
ACAS X: A Family of Next-Generation Collision Avoidance Systems
Neurocognition at the Wu Tsai Institute, Yale
Computational Modeling of Biological Systems
Computational Molecular Ecology
Social Capital and Economic Mobility
Building for Floods
Better Pathogen Targeting
Tracking Environmental Health Risks
AI for Cancer Diagnosis
Microplastic-Free by Design
Supporting Data-intensive Social Science
Sailing the Symbiosis Seascape
Wrangle Range Modeling
Shining a Light on Dark Matter
Grid Responsive Data Centers
Multifunctional 3D-Printed Materials
AI Pareidolia
Computing Hidden Health Threats from Heat
Staving off the Banana Apocalypse
CRISPR Mice, Smarter Science
Naval and Ocean Renewable Energy Hydrodynamics
AI That Speaks Human About Health
All Research Projects

Collaborative projects

ALL Collaborative PROJECTS

Outreach & Education Projects

See ALL Scholarships
100 Bigelow Street, Holyoke, MA 01040