Under Submission

How do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses

Accepted Papers

Towards Benchmarking Feature Type Inference for AutoML Platforms
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, Arun Kumar. SIGMOD 2021.

SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data
Vraj Shah, Side Li, Kevin Yang, Arun Kumar, Lawrence Saul. SIGMOD 2020.

Demonstration of SpeakQL: Speech-driven Multimodal Querying of Structured Data
Vraj Shah, Side Li, Kevin Yang, Arun Kumar, Lawrence Saul. SIGMOD (Demo track) 2019.

The ML Data Prep Zoo: Towards Semi-Automatic Data Preparation for ML
Vraj Shah, Arun Kumar. SIGMOD DEEM Workshop 2019.

SpeakQL: Towards Speech-driven Multi-modal Querying
Vraj Shah. SRC SIGMOD 2019.

Are Key-Foreign Key Joins Safe to Avoid when Learning High Capacity Classifiers?
Vraj Shah, Arun Kumar, Xiaojin Zhu. VLDB 2018.

SpeakQL: Towards Speech-driven Multi-modal Querying
Dharmil Chandarana, Vraj Shah, Arun Kumar, Lawrence Saul. HILDA Workshop, SIGMOD 2017.

GitHub’s Big Data Adaptor: An Eclipse Plugin.
Ali Sajedi, Vraj Shah, Eleni Stroulia. IBM CASCON 2015.

Technical Reports

Towards Benchmarking Feature Type Inference for AutoML Platforms
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, Arun Kumar.

SpeakQL: Towards Speech-driven Multi-modal Querying of Structured Data
Vraj Shah, Side Li, Arun Kumar, Lawrence Saul.

Are Key-Foreign Key Joins Safe to Avoid when Learning High Capacity Classifiers?
Vraj Shah, Arun Kumar, Xiaojin Zhu.