Notice ID:  2032H8-24-N-00005

The Department of Treasury/Internal Revenue Service (IRS) has a requirement that is looking to provide statistical products that safely allows researchers to perform statistical analysis using administrative tax data while protecting the confidentiality of taxpayer information.  It will develop a framework to produce synthetic public-use files on individual information derived from tax returns, and third party.

Purpose. The goal of this project is to continue the expansion of high-quality synthetic tax data and a validation process—two innovative new statistical methodologies designed to allow and expand research access to administrative tax data while protecting privacy. These are both cutting-edge approaches. Over the last seven years the new methods have been adapted to the large scale and complexity of tax datasets and finding ways to improve quality while protecting privacy. Using these methods, a synthetic supplemental PUF was created containing information about people who did not file income tax returns or have an obligation to file. Those data had never been publicly available before, but a synthetic version was produced that was safe to release. In combination with the PUF, these two files provide a more complete picture of the range of households in the US, including those with low incomes. A synthetic version of the 2012 PUF was created and a 2013 synthetic PUF is currently in tested. Evaluation of the 2012 synthetic PUF and the methodological contributions embodied in it were the subject of articles in the National Tax Journal and statistical journals.

Scope. The work under this contract will build on the work performed under contract TIRNO-17-C00062 and 2032H5-23-P-00106. The work on confidential data will be performed at IRS facilities or on IRS-provided laptops, by cleared vendor staff trained on the use of IRS systems and subject to the laws governing their use. No information derived from confidential data may be released unless cleared by SOI senior staff.

The contractor will continue to refine and improve the process of creating synthetic data files. The contractor will also begin training data science staff at SOI on the building and evaluation of synthetic data files and will support SOI in producing a public release of a fully synthetic PUF as a replacement for the traditional PUF. The contractor will develop documentation and training materials to support this effort suitable for publication and provide materials to support

publication on the SOI website, if necessary, of extensive documentation and analysis of the synthetic PUF.  The contractor will present its vision for deploying a validation process on the National Secure Data Service (NSDS) (or some other entity to be chosen by SOI) to SOI staff and receive feedback on data security plans and any other related issues that could be involved in a future implementation of the NSDS. The contractor, in collaboration with SOI, will continue to research and implement statistical methods for improving synthetic data while protecting privacy. For each of these, the contractor will provide regular updates on work status to SOI staff.  The contractor will provide formal training sessions on how to generate the synthetic data on future PUFs to SOI data science staff and regularly follow up to provide guidance in developing and evaluating one or more synthetic data files. The contractor and SOI staff will make at least one presentation to SOI reporting on progress, and the contractor will report on status to at least one other panel of synthetic data privacy experts.  The contractor will fully document all work.

The period of performance for this contract is one year, with four option years.  The anticipated period of performance for the base year is 07/01/2024 – 06/30/2025.

Read more here.

Want to get involved with OS AI? - A small number of Sponsorship Opportunities are now available here. Starting at $500.

How useful was this post?

Click on a star to rate it!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Leave a Reply