# Ideas

Debate contains a natural challenge dataset for transcription tasks. Events like LD and policy feature spreading (an abbreviation of “speed-reading”), where debaters attempt to speak as fast as possible while maintaining coherence. Create a web interface where debaters can interact with the existing Whisper API and “grade” their spreading. Get at least 100 hours of good spreading. Voila! A benchmark.

## Progress Library ✅

A Python library that makes it easy to regularly save your work on a long job. If you’re processing 1M samples with some function, the pickler would dump your work into a pickle every X (say 10,000) runs.

## Club Politics 🗳

How do you start a college club? What is a good leadership structure? What are the key ingredients to long-term survival? Which kinds of disasters end up sinking the club? Someone do a sociological study of clubs on campus!

## Implied Prestige ✨

A measure of the prestige of an employer: how often people leave other employers to come work there. You can measure this on a huge scale with a massive LinkedIn dataset. But as far as I know, such a dataset doesn’t exist yet (publicly). Anyone want to do some illegal scraping with me?

## Enrollment Time is an Instrument ⏰

Berkeley enrollment is notorious. Students with late enrollment times—which are randomly assigned—often can’t get into the classes they want. The keyword randomly should make any economist smile. It provides a perfect instrument to test a variety of hypotheses related to high-demand classes. Does doing PE classes improve GPA? Does missing out on classes like DATA 100 leave students unprepared for later classes? Can we access and predict outcome variables like mental/physical health, life satisfaction, or career choice?

## Interactive Linear Models 📈

Linear models shouldn’t be that hard to work with. Even with Sci-kit learn, building even a basic OLS model can cause headaches; building and testing new features is even more frustrating, requiring a whole lot of re-running code. A new concept: input a Pandas dataframe and instantly start playing around with linear models in an dynamic interface. Use checkboxes to enable or disable features. Easily engineer new features with buttons for transformations, interactions, boolean conditions, etc. Select your outcome(s) of choice, whether it be train accuracy, test accuracy, cross-validated loss, or so on. Can it be easily misused to produce spurious models? Sure, but so can existing tools. It does nothing else than speed-up the existing data science lifecycle.

## P = ? 🧐

When I’m using BerkeleyTime (a student-run course catalog for Berkleey) to check course averages, I often find myself balking at the number of P’s and NP’s. For a class like CS 189, where fully 30% of all students opt to take the class P/NP, it’s quite hard to look at the left-skewed distribution and feel comfortable. Presumably, most of those P’s are actually Cs and Bs. I suspect that the letter grades underlying Ps are similarly distributed across classes, and that using the true distributions of Ps for select classes, we could reasonably estimate the real, letter grade distributions of class grades. Beyond helping lazy assholes like me select easy classes, such a tool could help us understand grade inflation, rates of cheating, effect of e-learning during COVID (which coincided with lax P/NP rules), and so on.

## Speed Rubber Bands 🏎

When I’m driving and stuck behind a slow driver, I feel a kind of “distance debt” accumulating inside me. After passing the snail, I usually drive much faster than I usually would, in some sense making up for the lost distance. This should have implications for the design of speed bumps and slower zones. Do people respond by speeding up afterwards? Does that introduce externalities?

This is a space where there’s lots of room for product innovation. The current Strava-esque social network for weightlifting is BodySpace, which has a dreadful UI and what seems like a fairly inactive network. A smart fitness app that integrates with a bunch of other workout trackers (e.g. Strong) could gain a lot more traction.

Two intro CS courses at Berkeley, EECS 16A and EECS 16B, both save time by having students grade their own homeworks. The readers then grade a subset of the problems and scale the rest of the grades by the discrepancy between “official” grades and self grades. In theory, if students put good faith effort into self-grades, this a cost-effective and usually-fair way of assigning grades. But I’m worried about (a) the extent and unfairness of noise and (b) the incentives to put an honest effort in self-grades. Under what conditions should students put in such an effort? And how unfair are these systems?

## Lazy Learners 😪

Every high school had them. Once they get into college, it’s all about maintaining the minimum GPA to not get rescinded. So many people learn only because they have to. That’s a problem, since few people disagree that intrinsic motivation is important for learning. To test this theory, we could proxy the role of extrinsic/intrinsic motivations to learn by comparing high school grades before and after entering college.

## No Heights on Zoom 💂

Tall people have an advantage in life, especially in terms of income. But research suggests that the underlying factor is nutrition—well-fed children are both taller, smarter, and more charming. I suspect that some of the effect is driven by bias (and I’m not just salty). COVID presents a natural experiment: comparing job prospects for tall people before and after going on Zoom could reveal just how much being tall helps you in an interview or in the workplace.

## Scientists and Surnames 👩🏽‍🔬

Adopting a partner’s surname is an important decision. Even more so for academics, who are often referred to as (Last Name, Year). Do female scientists and academics take a career hit after changing their surname? In particular, do newlyweds experience fewer citations in their new papers?

## Lane Switching 🚘

Investors trade too much and would often be better off holding on to their stocks. There are many expectations. Maybe they overreact to information, or excessively fear missing out. Do the same principles apply to switching lanes in heavy traffic? On a congested highway, I sometimes switch when I see a faster neighboring lane, only for that lane to slow down immediately after; I should just “hold” my current lane. What’s the optimal way to weave between rush hour traffic?

## Sleepy Hackers 😴

Hackathons are notorious for pressuring participants to stay up, often for over 24 hours at a time. Caffeine cookies, candies, waters, and more flow freely from the stands. From a reader of Why We Sleep, the ritual is a health catastrophe. So do the data bear that out? How do participants feel 1 day after the event? Two days? How do measures of cognitive function change? How many don’t even make it home, getting into accidents along the way?

## Zipcars and Speeding 🏎️

Zipcar operates on a reservation system, which means drivers have to book a time frame in advance. One night, I could only book 90 minutes for a trip to Oakland. What would’ve been a smooth trip became rushed after I had to stop to recharge my phone. I ended up heading back with only 20 minutes left. Although I knew better than to speed, I probably subtly went a little faster than I normally would. Are there more accidents and traffic violations towards the end of a reservation? Let’s get in contact with Zipcar and offer to do the analysis for them!

## Chipotle’s Cup Fee 🥤

I fill my water cup with soda. Chipotle doesn’t like it. Yet, they recently decided to start charging ¢25 for water cups. But, as explored in the first chapter in Freakonomics, fees can often serve as moral license, crowding out intrinsic motivations. For those who didn’t steal soda on principle, the fee may empower them to abandon their moral riteousness. Since Chipotle supposedly studies soda-stealing, this is a question we could answer.

## Scoots and Steps 🛵

I used to be an adventurous one. I walked 10,000+ steps every day to get around Berkeley. Ever since I got my scooter, my daily steps have plummetted to somewhere around 2,000-3,000 steps. The Mayo Clinic recommends 10,000 steps a day for healthy adults, so the substitution may actually have tangible health impacts. With nearly everyone keeping track of their own daily steps, could we gather data points from new scooter owners to measure the decrease in walking associated with buying a scooter?

## GYST DeCal 💩

A class about getting your shit together. Just the gyst on: health, personal finance, productivity, socializing, professional development, studying, and mental wellness. Brief guest lectures from professors and professionals. Practical, take-home guides for implementing advice. Grades based on measured outcomes: Did you make your calendar? Did you actually track your personal finance? Have you meditated in the past 3 days?