Page 1 of 1

Data Distribution

Posted: Tue Feb 11, 2025 4:11 am
by jrineakter
The problem of trust in journalism is one that continues to challenge our friends in the media industry. One of the best defenses in this area is providing data alongside news stories. To that end we are working hard to support journalists with a public platform to host and share their data, as well as facilitate discussions if they so choose. Whether this is in public, or for a private audience, our hope is that data can help improve the quality and accuracy of our media. Whether this is on a national/global scale like the Associated Press, or the efforts that we support locally for an outlet like the Texas Tribune, we're proud to provide services to any data journalist who needs them. I encourage you to read the case study on our work with the Associated Press if you want more details and inspiration here.

One of the key tenets of our Public Benefit Charter is around the proliferation of open data. In support of this we're constantly working with individuals and groups to help their data reach as wide an audience as possible. In the past year, the number of datasets on data.world has more than tripled and the size of our global canada whatsapp number data community of users accessing them has more than doubled. Whether this is scientific data, FOIA request data, public sector data, or data that is being released as corporate philanthropy, data.world is happy to help our community share data for greater impact. Some recent examples of this include:

IRS 990 data
The Nonprofit Open Data Collective received a grant to digitize and distribute information from the 990 form (financial information about nonprofit organizations). This was a massive undertaking, often having to work from scanned images of physical documents. The results of this labor are now available on data.world in machine-readable format.

DOE Simulation Data
We're working with a group from the Department of Energy to help set up a pipeline for their interactive simulation data from their supercomputing environment. This will stream data to the platform and expand the footprint of those who can build upon that research.

SXSW data
For SXSW 2019, we were given special permission to scrape and publish the schedule data. Ready-made templates using parameterized queries allowed community members to query by any keyword, venue, genre, or topic across over 4,000 speaker sessions and 1,700 music sessions! It is the only known public dataset of SXSW sessions and we took care to make it easy to use with no SQL or analytics experience required.

Linked Data
data.world has long held the belief that just as the World Wide Web connects documents, which contain information rendered in human-readable natural languages, the web of the future will connect data. In support of this, data.world continues to strengthen and iterate on the graph model view of the world. Whether this is through the underlying technology of the data.world platform, acquisition and support of projects like Capsenta (yes, we made our first acquisition in the past year and ZDNet has a great write-up here) and Gra.fo, or participation in industry events and conferences like the International Semantic Web Conference (ISWC). I recently wrote this article in CIO on why Knowledge Graphs can lead to so much data-driven culture change. The future of data is bright, and by linking data we can help it shine even brighter.