Frequently Asked Questions
This is really intimidating. Do you expect all students to have read this before asking you a question?
No! I've put this here for a couple of reasons:
- I get a lot of e-mail. I want to get less e-mail, so I'm hoping that if people find the answer here, they'll send me less e-mail.
- I get a lot of e-mail. I want to answer my e-mail quickly, but sometimes I don't. I'm hoping that if people find the information they need here, they won't have to wait for me to answer.
- Some students are shy about asking questions; if they can find an answer here that they were too afraid to ask, then everyone is better off.
- I want to make sure I treat students fairly. If I put policies/expectations out here publicly, it gives me less leeway for my latent biases to impact students.
I want to work with you, and I'm currently a grad student at the University of Maryland. How do I do that?
In general, the matching process at UMD between professor and student typically happens during the first year (not before). This allows the student to get comfortable and set up within the university, adjust to the region, and to figure out how much time they have to devote to research. This is good because this also lets you get a sense for the research going on and what the personalities of the various professors/groups are.
If you want to work specifically with me, then take computational linguistics or machine learning (because of the huge number of students who want to work with me, this is non-negotiable; it will teach you what you need to know and will provide me a good sense of your abilities). Once you've done that, send me an e-mail with a high-level view of what sorts of things you're interested in and your courseload for when we'd be working together (I don't want to work with students who have no time for research).
Then schedule a meeting, and we'll figure out a project to work on together to show that you're able to work independently on a self-contained project. After we finish the project, we can discuss longer-term arrangements.
I want to work with you, and I'm currently an undergrad student at the University of Maryland. How do I do that?
First, take either the undergraduate natural language processing course or the undergraduate machine learning course. In other words, you first need to learn how to program and learn some specialized skills. But try to take these courses as quickly as you possibly can!
Then, send me an e-mail. I will send you a challenge problem to complete in about a week. If you don't have a week to work on a challenging problem, then wait to e-mail me when you have some time to work on such a problem (warning: if I'm having a particularly busy week, I may not get back to you quickly, as putting together a challenge problem takes some time too).
You'll need to sign up to work with me as an independent study. I'll also ask you to have a fairly light course load the semester you work with me, as undergraduate students have a tendency to take on too much.
I want to work with you, and I'm not currently a student at Maryland. How do I work with you?
Then you should apply to be a student at Maryland. The best way would be to apply to computer science and mention me specifically in your application. After you submit an application, please drop me an e-mail (put GRADAPP-20XX in the subject) with your CV letting me know you applied. I may not reply, but it's still very useful!
To help me quickly search for such e-mails (and to show that you've done your homework by reading this FAQ), please put GRADAPP-20XX in the subject line, where XX is the year you hope to enroll.
See the openings page.
Are PhD students at Maryland funded?
The University of Maryland, like all top American universities, makes a commitment to fund PhD students so long as they're making adequate progress. This includes supporting tuition, a stipend, and health insurance. This is typically through a combination of research assistant positions and teaching assistant positions (my students typically TA once or twice).
Can I mention you in my statement of purpose?
You don't need to specifically ask my permission to list my name in a statement of purpose so long as our interests are a good match. If you think they are, go ahead! (However, it may not help you to list me if I'm not taking students in a particular year.)
Do you have any postdocs available?
I'm fairly junior, so I'm trying to fund students right now. I'm fairly good about keeping my webpage updated, so see the openings page.
I want to work with you, and I'm currently a student at Colorado
I moved to the University of Maryland in August 2017. I will no longer be advising new students at the University of Colorado.
Can I work with you as an intern?
Unfortunately, it's very hard to evaluate the quality of candidates without a formal system (e.g., as we have for university admissions). As a result, it is my policy only to work with people directly recommended to me by a professor or researcher with whom I already have a relationship.
Will I get admitted? Why was I rejected?
I will not answer this sort of question. Don't even bother asking. I cannot give opinions on whether you will get accepted without seeing a full application. There are numerous venues where you can get uninformed opinions about your chances. In any given year, which students I accept depends on funding amounts, match between project and students, and who says yes or no and when. It's a very stochastic process, and I wish we had a more logical system.
Where should I do my PhD?
The most important thing is finding a PhD program where you will be happy. Hopefully that will be UMD; students often put too much weight on rankings. Don't ignore rankings, but pay attention to the people involved and your fit with the group.
If you absolutely must look at rankings, I think CSRankings.org are the least bad rankings available.
Can I do a PhD with you online?
A PhD is about learning how to be a researcher, and it's difficult to do that online, in part because much of what you learn is from your peers, not from a professor. Unless there's a very special circumstance (e.g., you're physically working with one of my existing collaborators), an online PhD is not workable. Even if it might work, it puts you at a competitive disadvantage to other applicants who are willing to physically be present.
You asked me to do a virtual interview after I applied for a PhD position. What does that mean and what should I do to prepare?
First, it means that you really stood out in the pool of applicants! I typically only interview five to ten applicants a year to select the candidates I will eventually invite to attend.
In many ways, it's a sanity check. If you say in your application that you're really good at X and you want to do Y, I'll ask about those things in a little more detail to better understand your background and your skills. I'll also ask about what you want out of a PhD program.
This really is a two-way conversation, however. We're going to work with each other for N years, and we both need to be sure that we can stand each other and work well together. So it's important for candidates to ask whatever questions they're concerned about too.
Can you give me a letter of reference?
I typically only write letters for students whose committee I've served on, whom I've worked on a research project with, or who did very well in my class. Unless you are my direct advisee, you must ask me before giving my name out.
When you ask, please send a list of bulleted points that answers the following questions: how we know each other (e.g. took class X, received grade Y, completed project on Z), what research we have worked on (what the project was about, where it was published, your role in the project). Good rec letters contain details, and the more details you can provide that I can then surround with context, the better your letter will be.
For example, if I relied on my memory to write a letter of recommendation, I would be able to say something like "Susan took my class and did great, she did a project on music stuff". That's not as good as "Susan took my class Fall 2015, earned an A, and presented a final project on distinguishing musical styles automatically given the waveform of a song. Their group used a variety of techniques (support vector machines, convolutional neural nets, and k-nearest neighbors) to decrease the error rate of a strong baseline from 0.4 to 0.2". Obviously the second one is better, but I can't recall of the details myself. Your bullet points will help me recall details and to put your work into context.
What are your expectations / preferences in terms of what a student should know?
I personally like C++ and Python, but the culture here leans to Java, which I've been using more and more (and likely will continue to). I prefer writing tests to debugging, but debugging is a necessary evil. I do like reinventing the wheel somewhat to keep things self-contained and consistent, but I contribute the result to things like NLTK so that other people don't have to do the same. I also like using style checkers and the like to keep myself organized. (Though I say this, you can get a more honest picture of my coding style by looking at what I've actually written.)
Students who want to work with me should
- have basic knowledge of Python, C++, or Java (e.g. be able to write a dynamic program in that language),
- understand probability (Bayes rule, conditional probabilities, smoothing),
- compile LaTeX documents using BibTeX, and
- use version control software (e.g. git or svn)
These are the bare minimum requirements. If you do not meet these requirements, please take some classes to acquire these skills (preferably mine!) before asking to collaborate on research.
You should already code in some language pretty well, and conforming to my coding style will increase the probability that I'll be more hands-on in helping you code and debug, but if you want to program in LISP or Prolog, that's perfectly fine too, as long as it works for you.
Being comfortable with probability is probably the more important requirement. You'll likely have to deal with messy probability distributions, take expectations, derive conditional distributions given a joint distribution, implement dynamic programming to sample from PCFG grammars, do Taylor approximations, do some optimizations, etc. This shouldn't be taken as a laundry list of things you should know (it's great if you do) but just as a heads up of the kinds of things you might run into; part of a graduate education (life, for that matter) is learning new stuff. There will be many opportunities to learn: from classes, your peers, and reading group.
I think attending (and contributing to) a reading group or two is critical for learning about a field and being a good scholar; it's fun and not a chore at all, but I want to be up front in saying that any student of mine should be an active participant (i.e., don't just show up; you need to present paper and be involved in the discussion of every paper. If you didn't understand a paper, ask smart questions until you do. If you did understand a paper well, answer other people's questions.) in a reading group or two.
Reading groups are also important for being able to "look smart" when you're interviewing. You'll need to be able to connect your work to what other people do. A reading group lets you know how your thesis connects to other research topics and talk intelligently about them. Unfortunately, this can't be done quickly; it requires dedication over many years to learn about the breadth of research that folks explore. So while you might feel like skipping reading group once is a good decision to get more work done, it's ultimately a bad decision because you need to consistently go to understand a broad range of topics.
How do you interact with students?
I like to have a group meeting every other week with students I'm working with (broadly construed), and one-on-one meetings as needed with students. I use Google calendar to set up my appointments, so students can grab a meeting whenever they need to. I expect students working with me full time to meet with me on average once a week (sometimes much more, such as before a paper deadline, and sometimes less). I use this online system so that my meetings are contiguous and that students always know when I'm available (and I can change things without e-mail). Students should sign up for a meeting at least 24 hours in advance. It's okay to schedule meetings outside of that time, but that should be the exception (I try to maximize the amount of contiguous time I have to research, write, and think).
In addition, everyone in my group (me included) sends a weekly e-mail to everybody saying:
- What they worked on that week
- What they plan to work on next week
- Anything that's holding them up or blocking their progress
Anyone who is working for me full time or who is my direct advisee must send me such an e-mail (with the subject [Snippet YYYY MM DD]) sometime Sunday evening or Monday morning. I find that this is very helpful because I sometimes ask myself (or have funding agencies ask me) what I (and my students) did in a particular time period. These e-mails really help me figure that out without bugging other people. It also helps me stay productive by setting realistic goals; I use this weekly todo list to populate my daily todo list.
So what makes a good goal? You should have "Big Picture Goals" that carry over from week to week; these are often at the level of something you want to make happen this year or semester. Every week you should do something that brings you closer to achieving those big goals. Within a week, your goals should be smart. Don't have vague goals like "write code" or "continue reading". It should be obvious whether you succeeded or not in your goal (specific and measurable), it should fit in with the big picture (relevant), it should be doable in a week (time-bound and attainable).
It's okay to send it earlier than Monday; the weekend is fine too. If you're a day late, that's less good, but better than not sending it at all. However, keep the Monday date in the subject so I can search for it. (Or just reply to the first Snippet that gets sent; no need to wait for me, it's okay to start the chain.)
The snippet should be sent to both the project you're working on the group e-mail list.
Outside of that, I prefer face-to-face communication (when I'm not sitting down at my computer being productive) or e-mail as a communication mechanism. Instant messages are also sometimes okay for quick questions, but never send an e-mail and then ask via IM "Did you see my e-mail?"
Do I have to be in lab?
One of the great things about academia is the ability to have a flexible schedule, working when and where you want. However, there are limits to this. On days where we have meetings, it's best if you come in person to those meetings. Within reason, it's okay to join remotely some of the time, but the norm should be to attend in person.
Beyond meetings, it's also good to work a full day at least once a week. It's important to have a place where you can work productively in the lab, be a part of the lab community, and to absorb the lab culture and its tacit knowledge. Don't just appear on campus for meetings and then disappear.
Finally, when we have a big paper deadline, you absolutely must make every effort to be in lab (if you aren't lab, I expect a far better explanation than "I didn't feel like it"). I work very hard to make sure I'm able to do this (e.g., flying in relatives to help with my daughter), and I expect you to do the same. The most important part of your job is publishing papers, and while there are good electronic tools to facilitate collaboration, they are not a replacement for in-person communication. Particularly for inexperienced students, being around older students working on papers is a very valuable experience. You don't know what you don't know, and you can get valuable information from being in the same room as other people working on papers.
I need you to do something (look over a draft, send an e-mail, etc.). How should I best make sure that happens?
The most important thing is to make sure it's on my radar. If you have an important deadline, make sure it appears in your snippet that you send me weekly. I will make sure I budget my time to ensure that it gets taken care of. Give me as much warning as possible. I get grumpy if I have to rearrange my schedule for you at the last minute.
It's fine (and helpful) for you to remind me. However, I'd like to make the following caveats. Unless the deadline is hours away, the best way is over e-mail; not phone or IM. It's less intrusive and I have systems for dealing with tasks that arrive over e-mail. The frequency of the reminder is also important. No more than once every five days, I would suggest.
Finally, make it as easy as possible for me to do what you need me to do. Have your reminder e-mail reference all of the material I need to do the task. If I'm reviewing a paper, remind me where in the repository it lives and send me a compiled PDF. If I need to write a letter, provide the background material and the contact information in one place.
How important are classes once I'm a PhD student?
One very frequent problem I see is that young first year PhD students want to do very well in their classes and think of research as a hobby.
For RAs, it is very much a job. Your professor has secured funding for PhD students to do research and to produce results. If you fail to produce, it makes the professor look bad to his funders, and the professor will not want to pay you to do research in the future (i.e., like a job, you can get fired).
Grades are not important whatsoever, so long as you're not getting kicked out of the program. You should use classes to become a better researcher, but if you're chasing after an A when a B would suffice and your research suffers, that's detrimental to yourself, your professor, and to science.
If you're not an RA (on fellowship or TA), then doing research is often a tryout for an RA. Unless you're 100% sure you'll have fellowship funding your entire time as a PhD student, you should make sure your professor would take you on as an RA in a heartbeat if needed.
How often should I be publishing?
You should always have an idea that you're actively working on for a paper. Publishing between 1-2 papers a year is a good average (however, this does not mean that you'll always have a publication every year). Under normal circumstances, I expect students have one publication at least submitted before the end of their second year, two by their proposal, and three by their defense (it's of course fine to have more, but don't prioritize quantity over quality).
If you haven't published in two consecutive years as a first author in a top venue, that's a huge problem, and you're unlikely to get an RA in the future.
I'm submitting a paper we talked about, can I add you as an author?
I should not be surprised by a paper. If I'm going to be an author, I want to: 1) see a draft with the "big picture" at least two weeks before the deadline 2) see a nearly complete draft at least a week before the deadline. (I reserve the right to still say no to papers even if you follow these rules, e.g., if I'm on vacation.)
For students working directly with me in my group, this is less of an issue, I know what's going on and can judge what's going on and whether we can submit (a collaborative discussion). But for students who come to me to discuss an idea, vanish for two months, and then suddenly appear and want me to be a coauthor, this can be pretty annoying. My likely response is "no", I will not be a coauthor, and I will not contribute to the paper. If you wait until the last minute, the paper likely won't be any good, and I have other papers with authors who were responsible and played by the rules.
You can still choose to submit, but do not list me as an author.
Can I work on projects that don't involve you?
First, there's a question of funding. If you're funded on a fellowship, TA, or self-funded, then you just need to make sure that I'm happy to continue advising you (i.e., making good progress to your degree). It's fine to take a break and explore your interests, but don't ignore your thesis.
However, If you're funded on a grant, you need to be working on work that's consistent with the goals of the grant. Maintaining these relationships is necessary for me (and future students) to have funding. If your only publication in three years has a majority of authors not working on the grant, that will also look suspicious.
This isn't to say that you can never work on a project that doesn't involve me. For example, many students need a week or two to wrap up their internship projects. This is totally fine, and it's not reasonable (or appropriate) for me to get involved. However, if you're still working with your internship advisors six months afterward and it's interfering with your grant-funded work, then I either need to be involved or you need to give it up. At the very least there needs to be a frank conversation between me and the internship host (it's not fair for you to have to manage these conflicting relationships/priorities).
Why is it important to cite related work? Can't you just add the citations for me?
I often ask students to cite papers when we're working on a draft. Sometimes a citation will be very trivial to add (e.g., at the end of a sentence), and students may rightly wonder why I don't just add it myself. Am I really that lazy?
Sometimes I am so rushed that it is indeed partly time pressure that prevents me from citing something myself. But often I say this because I want you to read the paper. It may not be a paper you're familiar with. If I just cite it, then you don't learn the material in the paper (and since this is a paper you're writing, you should know about that material).
Sometimes I'll be deliberately vague ("you should cite Eisner/Dreyer here"). Again, this could be me being lazy, but sometimes multiple papers could be relevant, and I'm not sure which is the best paper that should be cited in this circumstance. Moreover, particularly when an author (or group of authors) have written a number of papers on a topic, you should be aware of the whole trajectory (and there could be follow on papers I may not know about).
Cool. So this means I can just ignore your citations until I get to the related work section (which I'll save for last)?
NO! Knowing about previous work could impact all aspects of the paper. You might find out about a dataset, evaluation, or framing of the problem that could help you write other sections of the paper. Science is about standing on the shoulders of giants: if you don't know what has come before, how can you improve on it?
This has become an increasingly vexing issue in the age of deep learning; students believe that neural networks are magic and that any technique that doesn't have a hidden layer and a nonlinearity isn't worth their attention. I can confidently say that this is not the case (a least in 2018), and older or non-neural papers are still worth reading, even if your model is neural.
Academia and Research
Is topic modeling dead? Should we all be doing deep learning?
Deep learning should be part of any modern researcher's toolkit. However, I do not think that this means that we should completely abandon topic models. Topic models are still very useful for use cases where interpretability is important. You'll still see many researchers in digital humanities using topic models, for instance, because they care about telling a good story and understanding their data.
As topic models become more of a utility, I think we'll see less of the "topic model of the week" that we saw 2005-2010. I think the important questions are how to incorporate topic models into real-world workflows and measuring whether topic models help users with those tasks. At the risk of self-promotion, I think a good example of that is Forough's paper on how topic models help people annotate data more effectively.
One place where we will see less activity is topic modeling is as a feature for downstream model, which was quite popular for a while. Here, word embedding have completely taken over. They obviously do a better job, but perhaps the interpretability of topic models was a nice side effect that we're missing out on.
For a more complete overview of where I think topic modeling has been, what's it's been useful for, and where it's going, Yuening, David and I have a new book on Applications of Topic Models.
How should I collect/store data?
Google spreadsheets are good in most cases during collection. But once we're done, they should be stored in a way that's long-term readable (e.g., JSON/CSV) and deposited with a library.
You're part of an iSchool? What's that?
It's fun. Unlike computer science, which can sometimes ignore humans, iSchools care about the intersection of information, technology, and society. It's a good fit for me because I'm interested in computational social science and human-in-the-loop machine learning.
I'm trying to use your code, but I'm having trouble. How should I get help?
E-mail all of the people who worked on the paper associated with the code with
- a minimal (simple as possible) example that can replicate your problem;
- the inputs that replicate your problem (again, this should be as simple as possible; sending multi-megabyte files is usually not minimal);
- exactly what you did (the exact command line used);
- what you expected to see;
- what you got instead (include error messages and any output); and
- what versions of various resources you're using (NLTK, Java, gcc, boost, protocol buffers, etc.).
This information is necessary for us to help you with your problem. The simpler it is to replicate your problem, the faster you will get a response. More complicated setup take longer for us to try out and debug. If your example is simple enough, we can often see the problem ourselves without running code.
Each e-mail should be self-contained. All the information to reproduce the bug should be in one place. This helps us quickly reproduce the bug, and it also ensures that you've not tweaked anything that might prevent us from isolating the issue.
What are your pronoun preferences?
For myself, he/him/his and they/them/their are both fine. I prefer the latter for academic writing and the former for day-to-day communication.
If I use the wrong pronoun for you, please let me know ASAP.
What's up with your name? Why is it hyphenated? What should I call you? Why is your UMD username "ying"?
My parents' last names are Boyd and Graber. When I was born they hyphenated (why people whose nicknames were "Toni the Body" and "Little Grabber" would do so is beyond me; my nickname is obvious). As a result, I am deeply, personally, against hyphenating names. Don't do it. It's not a sustainable practice, and it leads to all sorts of problems. People think my last name is just "Boyd" or "Graber", web forms don't think I have a valid name, and there's only about a forty percent chance someone will get my name right after one telling.
Most people call me Jordan, which is just fine by me. I also answer to JBG.
Our family calls itself the "Ying"s (wife's name). That's why my UMD username is ying (and why "Ying" is listed on Testudo). My wife, who got to UMD after me, is zying.
I'm a TA or grader for one of your courses; what do I need to know?
- First, make sure that we have a meeting before the semester starts.
- Attend at least a class or two to get a feel of what's going on.
- As each assignment is posted, look it over to make sure I haven't done anything stupid (e.g., a confusing problem); it will make your life easier.
- Once assignments arrive, create an ontology of all of the mistakes that people have made (do this before you start "grading"); this will allow you to fairly and consistently deduct points.
- Using that ontology, create a template that you can use to provide feedback to students (e.g. by copy/paste or deleting). This allows you to explain each mistake in detail without having to retype the same thing over and over again. It also ensures that you give consistent feedback for each mistake people make.
- Post a synopsis of the mistakes that people made and how to correct them.
- Never give a grade without explaining why people got the grade they did.
Why did you leave the University of Maryland and then why did you go back?
I came to Colorado (where I was born) to be close to my family (especially my dad, who had some health scares) and to start of a family of our own. All of this went according to plan. However, after getting laid off during her pregnancy, for two years my wife was unable to find a science communications position in Colorado.
While my wife's career issues were the biggest reason we left Colorado, I also had professional difficulties in Colorado. I did not get any credit toward tenure at Colorado (after being a professor for four years at Maryland), and I felt the tenure process afterward was needlessly unpredictable and difficult.
I did not receive accurate information about tenure procedures at Colorado, and the department made my path to tenure more difficult than it should have been. I don't want to share too much on a public webpage (and perhaps I already am), but let me give one concrete example that will hopefully be helpful to junior/future tenure-track faculty who might read this and learn from my mistakes.
I had agreed to teach a new data science course in Fall 2016 that the department desperately wanted taught. I agreed to teach a hundred-student section, as I thought that I would have my tenure packet submitted at that point and wanted to be a good citizen. I was then surprised to learn instead this class would instead decide whether or not I would get tenure in 2019 (nine years after I started a tenure track position in Maryland); if I didn't do well teaching this class, my tenure would be delayed even further.
I didn't want my tenure to be decided by my first offering of a huge, new class. On Feburary 1, 2016, I asked for either a smaller pilot offering of the class or a TA (other faculty in my area taught 20 person undergrad courses or 48 person undergrad classes with a TA). The course remained at 100 students without a TA. The department leadership changed, and I asked again for help in an August 4, 2016 meeting. The course remained at 100 students without a TA, the largest course without at TA taught by tenured/tenure-track faculty. Needless to say, the course did not go well, and in Fall 2016 I had no idea when or whether I would get tenure at Colorado.
I believe the problem was institutional, not personal. The individuals involved mostly made reasonable decisions given the unusual circumstances, and I don't think there was any personal animosity toward me. I think that they are intelligent, friendly, caring people who want junior faculty to succeede, but did not have the bandwidth to understand my situation, let alone help. University administrators are overwhelmed by too many responsibilities, and a single assistant professor in computer science who spent four years elsewhere is not worth upsetting campus procedures or departmental power dynamics. There were many little mistakes that were made along the way. Each individual mistake is forgivable (and I've forgiven the people involved even if I still feel betrayed by the institution, which stings a little worse as a native Coloradan), and the mistakes were so broadly distributed that everyone could say "this is somebody else's problem, and I'm not going to get worked up about it and try to fix it".
However, these mistakes did happen and were left uncorrected. Together, they created a miserable situation for me and my family. I should have fought harder to prevent those mistakes from being made and to complained more loudly after they were. Nevertheless, I was either not persistent, persuasive, or sympathetic enough to get things fixed on my own. I am not the bureaucratic bare-knuckled brawler required to make it on my own at a place like Colorado, and I didn't have the kind of support from senior faculty a crappy negotiator like me would need to survive. In the end, it became clear that if I was going to get tenure and my wife would find fulfiling employment, we would need to find our own solution and leave Colorado.
Thankfully, in late 2016, my wife got a great job offer from UMD and six months later I was very happy to be able to follow her (there were some stressful months in between, though!). I was very excited to be returning to the great research environment with supportive senior faculty across computer science (tenure home), UMIACS, language science, and the iSchool (each chipped in for my position).
Outside of working hours, there's a lot I will miss about Colorado (where I was born and paradise on Earth!), and I am hoping for lots of opportunities to spend time there and maintain the professional connections I've made. It's too bad Colorado was such a poor place for us to advance our careers.
I had described going to Colorado as "returning home". However, Maryland made me a postdoc offer when nobody else would in 2009, hired me as faculty when nobody else would in 2010, and then finally put me up for tenure when nobody else would. While Colorado is the place where I was born and where I have the most relatives, Maryland is my academic home, and I'm glad to return.
What's your Erdös-Bacon number?