How Github Makes Everything About Teaching CS Better

Before I began drafting the game design project, I spoke to a few people about how team projects are done in the professional world. One guy told me plainly, "If you aren't using Git, you're doing it wrong."

I knew I needed to learn how to use Git, but I couldn't figure out where to begin. I understood it was version management software that enabled multiple people to collaborate on a single project, but every tutorial I found made immediate assumptions about words I didn't understand. What was the difference between "Push" and "Commit"? Between "Pull" and "Fetch"? "Repository" and "Branch"? I was able to follow the tutorials, but the tutorials didn't give me an understanding of Git that made me feel comfortable teaching students, let alone debugging issues as they arose. What I needed was an all-in-one tutorial that explained Git like I was a ninth-grader. 

I can't say I've yet mastered Git, but I have learned a whole lot - to the point where I think I could write a tutorial for a ninth-grader. I am sure my technical depth of understanding is limited, but if I can't paint an accurate picture of all of its inner workings, I can at least paint a functional one. That is to say: please forgive me (and kindly correct me) if there is an error in my explanation.

Consider this a starter's manual for using Github in the classroom. There is a lot of information on this page, so I've separated it with headers. The information is intended the read like a tutorial, not a reference, but if you are already familiar with Github, you might just  skip to the end.

 

What is Github?

Git is version management software. Github is a website that freely hosts public Git projects (private projects can be hosted for a fee.) Eclipse is an Integrated Development Environment ("IDE", which is essentially a program for programming) that includes integration with Github. Eclipse + Github is how I interact with Git. It doesn't have to be that way, (you can actually use Git independently of both Github and an IDE) but it's what I know. So while the principles of Git remain constant across any configuration, I don't have the practical experience to use Git on its own. Git does not have a user interface, it is run through the command line.

As version management software, what Git does is organize changes that are made to a file system and the contents of its files. Git can manage changes made by hundreds of collaborators, so to do this, Git has to be very smart. The foundation of this organization is called a repository. A repository is a designated location for all of the files associated with a project.

Before we continue, something else must be understood about repositories. The people who work on that project are sharing their work over the Internet, so naturally, the repository is stored on a server. This would be fine, except if that was the case, every collaborator would be required to have an Internet connection in order to work on the project. Sometimes, having a constant Internet connection isn't an issue, but when it is, it is nice to have a copy of the server's repository on your own machine. Such a copy is called a local repository. Periodically, it is necessary to "update" the local repository. When this happens, it is called "fetching". You 'fetch' changes made to a repository.

As was previously stated, a repository is a designated location for all of the files in the project. One repository is online (on Github, usually) and a copy is kept in your machine. Let's suppose three people collaborate on such a repository, Arianna, Brian and Conner – conveniently abbreviated A, B and C. These three are working on a home screen for a Web app. At this point in the project, it fulfills basic functional requirements, but Conner wants to give it a little extra. Let's say he wants to add an up-to-date weather panel. He needs to code this function from the ground up, and he's worried that if he updates the current version, during the debugging process, the basic version will be rendered non-functional. What Conner really needs is a "draft" version so he can make adjustments to without the risk of ruining the original version. In Git, this is called a new branch. Branches are different versions of the same repository. Typically, the working, published branch is given the name "master". So when Conner makes a new branch, he calls it "c-weather". So now, ABC's repository had two branches: a "master" branch that contains a version of the fully-functioning program and a "c-weather" branch that contains everything from the master branch, plus some changes made by Conner that may or may not work. By switching between branches, one can view the entire file system according to the updates made within that branch. Just like there's a local repository, there is also a local branch. As Conner continues to work, the changes are made on the local branch. When he is ready to share his work with the rest of his team, he will move his work from the local branch to the branch on the server, a process we will describe shortly. 

The "Git Repository" view in Eclipse. A few of the repositories in this screenshot belong to students who presented in class. My own repository (the one I use for teaching) is in the middle, called "classwork". I have two branches, "master" and "home". Locally, I called my branches "master" and "search-practice". Note that the two master branches are synced. They both are up-to-date with the commit "Updated method variations." The other branches are not synced.

The "Git Repository" view in Eclipse. A few of the repositories in this screenshot belong to students who presented in class. My own repository (the one I use for teaching) is in the middle, called "classwork". I have two branches, "master" and "home". Locally, I called my branches "master" and "search-practice". Note that the two master branches are synced. They both are up-to-date with the commit "Updated method variations." The other branches are not synced.

Now, Derek, Eric and Fatima (DEF) come along and admire ABC's open-source project. They say "Hey, what you have is really nice, we'd like to do something similar. Can we work with you?" But ABC are a little protective of their work and they don't think it's wise to give collaborative privileges to a group of strangers. They want a way to let DEF make a new "branch" without actually letting their "branch" be a part of the original repository. So what DEF are given is the ability to "fork" the repository. Forking means creating an independent copy of another repository. By forking ABC's repository, DEF can see all the branches and add, change and delete without bothering ABC.

So if Conner has his own branch, theoretically Arianna could have also have her own "a-sports" branch and Brian could have his own "b-business" branch. With all of these different branches, what does team ABC do when they want to combine each branch and make an updated master version? The short answer is the have to merge their branches back into the master branch. When Git merges changes from one branch into another branch, it compares the differences between the two branches and marks anything that might be a conflict. Therefore, the process of merging sometimes involves carefully reviewing and resolving the merge conflicts. It can be rather complicated. So at this point, you might be wondering, "Which is the best method for my class? Branching or Forking?" Answer: neither should be used when you are first learning Git. Now that you know about branching and forking, let's plan on not overwhelming our students. At the beginning of the year, I simplify this whole process by giving each student their own repository with one master branch and I teach Git in such a way that merge conflicts are often avoided.

An example of the notation that is automatically generated when a merge conflict is found.

An example of the notation that is automatically generated when a merge conflict is found.

 

Getting Started

Okay, so we have student Arianna who now has her own repository because she followed the steps on her teacher's class website (and you can too.) Note that Arianna did this herself. It is not necessary that the teacher set up each student's repository. After learning the difference between her Github repository and her local repository, Arianna is now interested in how Github is used. She knows that she will be using the Eclipse IDE to edit code. The files in the Eclipse IDE are a reflection of her local repository. The question, then, is how do you link the local repository to the Github repository? This link is called a remote. Remotes are assigned a name (useful for when you want to connect to multiple repositories) but my students will only ever use one remote. The name of that remote is, by convention, "origin". The remote sends changes from her local branch to her Github repository in a process called pushing and receives changes from the Github branch to her local branch in a process called pulling.

 

Pushing

While Arianna works on Eclipse, she can save changes the old fashioned way as often as she pleases. Occasionally, however, she might want to "dog-ear" a particular change in her program. That is to say, she may want to mark a change so that she can return back to it or maybe she wants to consider updating the repository with the change. Such a change is called a commit. Updating the repository with the change is called pushing a commit.  While you can commit without pushing, you would only do so for the same reason you dog ear a page of a book - to return to it later. Going from a local repository to the server is often called "pushing upstream". The remote is the instrument that handles the push. Technically, (but in no particular order)it verifies the commit (all commits must have a message to describe the change,) checks that there are no commits upstream that the user hasn't already integrated into his or her local branch, verifies the user's credentials (password), then updates the correct branch with the new commit. Configuring the remote for pushing is as simple as specifying which local branch get pushed to which server branch. Because Arianna only has one branch, this is a no-brainer: master → master. 

This is the "Git Staging" view in Eclipse. ArraysPractice8.java and SampleElement.java are "staged" to commit. ArraysMain.java is not staged because I do not wish to commit it at this time. A commit message is required before I can "Commit and Push..."

This is the "Git Staging" view in Eclipse. ArraysPractice8.java and SampleElement.java are "staged" to commit. ArraysMain.java is not staged because I do not wish to commit it at this time. A commit message is required before I can "Commit and Push..."

 

Let's suppose Arianna is at a computer in the library. The library doesn't have Eclipse on its computers, but it has a browser. Arianna can actually log in to her Github account and edit her code through the browser. This is, in fact, even easier because since Github is operating on the server, no remote is necessary, she's already on the server repository!

Pulling

So, let's suppose the next day, Arianna come into class and attempts to push her code again. She's going to run into a problem. If Arianna made commits from the library, the remote will see there are  upstream commits that aren't already integrated into her local branch. It's going to give an error with the label "non-fast-forward," meaning in order to push the new commits, they must incorporate all prior commits. Incorporating upstream commits into your local branch is called pulling. Not the difference between fetching and pulling: Fetching updates your local repository to reflect the server's repository. Pulling pulls those updates into your local branch – the branch where you are doing your work. Fetching isn't very useful if you only have one branch. You might fetch if you wanted to checkout (look at) two different branches and compare them before deciding which one to pull.

Arianna pulls her code from the library and now that her local branch is up-to-date, there will be no problems the next time she attempts to push.

 

PROBLEMS THAT OFTEN ARISE IN CLASS

The most common problem with pushing occurs when there are upstream commits that need to first be pulled. Occasionally, those upstream commits conflict with what you've done in your local branch. When that happens, those conflicts are annotated with marks that signify a merge conflict. The marks separate two groups of code: one group that reflects the changes upstream, another group that reflects the local branch, or "HEAD". The user must resolve these conflicts by deleting irrelevant code. Once the merge conflict has been resolved, the commit can be safely pushed. 

Sometimes, in the worst of cases, a merge conflict may be too problematic to resolve. When this happens, it might be easier to force push. To force push means to deliberately override any and all upstream commits. The option to force push appears as a check box in a dialog box that appears when you select the option to Push with options ("Push..." as opposed to simply, "Push".)   

 

Sometimes (but usually only when directions were not properly followed) a student really, really messes things up. Eclipse might deny any attempts to push, providing an unknown error message. There have been times when I can't tell what the student did and the student also cannot remember what he or she did to stop absolutely anything from working at all. When this happens, I often delete the Git repository (right click on the repository in the Git Repository view and select "Delete repository") and all associated working trees and directories. It is then possible to import the project from Github and start fresh. (Instructions for importing are also on this website.)

Advantages

Now I'd finally like to address the title of this post, "How Github Makes Everything About Teaching CS Better." Let me start by saying this is my first year using Github from the very first week of school. Last year I only used it for the final project at the end of year. Wow. What a difference it makes to get students accustomed to Git from Day One. Here are all of the reasons why using Github is the best thing I ever did for my instruction, as well as a few additional tips:

1. Notetaking

Like my students, I use the Eclipse IDE when I write code in class. The computer is hooked up to a projector and students can follow along, copying the code and comments as I write. Git provides an added resource for students however: at the end of the day, by simply pushing my code, everything I have done in class gets updated to my online repository. With a link to my repository, every student (even those who are absent) can view my original code. This is helpful for students who have difficulty keeping up on their own machines or for students who are out that day. 

2. Reliable Back Ups

Before I started using Git, students were bringing in their own flash drives to back up their code and bring it from school to their home computers. Not only is this cumbersome, but the teacher desk in the computer room currently stows about twelve flash drives that have been accidentally left behind. With Github, students have their repository on the server, a local repository at school, and a local repository at home. Flash drives are obsolete and the back up system is almost fool proof. Even if the student could some how corrupt both their repository at home AND the repository on the server, when the student comes into class, I can help him or her restore everything by force pushing from his or her repository at school.

3. Availability

Github allows you to edit your repository through the browser. Occasionally, I get a student who is having trouble setting up Eclipse at home. He or she says it is preventing them from doing their homework. But when I show my students that, using Github, they can even do their homework on their phones, excuses go out the window.

4. Presenting Code and Grading

At the beginning of the year, I have all of my students fill out a Google form to provide their Github URIs. Google then generates a spreadsheet that organizes all of that information. Throughout the year, if I ever want to see what a student has been doing, no matter where I am, I can either pull his or her repository into Eclipse or view it online through Github. That means I can grade whenever, wherever and if the student has to present in front of the class, it only takes about 30 seconds to import all of his or her code onto my machine. It's astonishing to think I actually used to have my students copy, paste and email their source code.

5. Working in Groups 

I recently had students complete a project, working in groups of four. At the end of the year, they will complete a project as a whole class. Between now and then, I will teach them how to use branches, but until them, I have simplified the procedures for setting up and collaborating on a Git repository. One student acts as the administrator. This student creates a repository on his or her own Github account. The student must manually add the user names of the other member of  the group to the repository's list of collaborators, which can be found in Settings. The admin is solely responsible for the Main class (the class that contains the 'main' method.) All of the other members of the group create and maintain their own class files. As long as every student only makes edits to their own class file and the admin is the only one who makes edits to the Main class, merge conflicts are completely avoided. The students must be taught that they cannot push their own code until after they have pulled commits by the other members of their team. With these norms in place, working together is easy. Not only is it easy, but it provides information that is not available any other way. In Eclipse (and on Github) you can view the history of commits on a given branch. This history shows every commit by every student and the time of the commit. A teacher can see if a student is not pulling his or her own weight or if a student appears to be procrastinating. Furthermore, using the history, a teacher can checkout the project at any time in its history. For example, I assigned a project that was due Friday at 2:00pm. It took nearly a week for all of the groups to present, but the group that presented last had no advantage over the group that presented first: I only checked out commits that had been made before the deadline! 

Students must be added as Collaborators via the Settings tab in Github. Public repositories are visible and can be forked by anyone. In order to push to a repository, one must own the repository or me listed as a collaborator.

Students must be added as Collaborators via the Settings tab in Github. Public repositories are visible and can be forked by anyone. In order to push to a repository, one must own the repository or me listed as a collaborator.

 

Next Steps

I've only touched the surface of what Git can do. I would encourage you to learn more about how you can use Git by learning more about multiple branches, which is really a necessity if you are planning on assigning a major project that might go through multiple versions. Some thing else you might try doing is looking at real world, open source projects via Github. One could even fork such projects and customize them in class! While the learning curve may seem a bit intimidating in the beginning, I am sure your students will enjoy the possibilities that become available to them once managing a repository becomes routine.