Social Coding: Pull Requests - What to Do When Things Get Complicated

Engineering | Dave Syer | July 18, 2011 | ...

Scenario: you want to contribute some code to an open source project hosted on a public git repository service like github. Lots of people make pull requests to projects I'm involved in and many times they are more complicated to merge than they need to be, which slows down the process a bit. The basic workflow is conceptually simple:

  1. fork a public open source project
  2. make some changes to it locally and push them up to your own remote fork
  3. ask the project lead to merge your changes with the main codebase

and there is an excellent account of this basic workflow in a blog by Keith Donald.

Complications arise when the main codebase changes in between the time you fork it and the time you send the pull request, or (worse) you want to send multiple pull requests for different features or bugfixes, and need to keep them separate so the project owner can deal with them individually. This tutorial aims to help you navigate the complications using git.

The descriptions here use github domain language ("pull request", "fork", "merge" etc.), but the same principles apply to other public git services. We assume for the purposes of this tutorial that the public project is accepting pull requests on its master branch. Most Spring projects work that way, but some other public projects don't. You can substitute the word "master" below with the correct branch name and the same examples should all be roughly correct.

To help you follow what's going on locally, the shell commands below beginning with "$" can be extracted into a script and run in the order they appear. The endpoint should be a local repository in a directory called "work" that has an origin linked to its master branch (simulating the remote public project) and two branches on a private fork. The two branches have the same contents at their heads, but different commit histories (as per the ASCII diagram at the bottom).

The Two Remote Repositories

If you are going to send a pull request, there are two remote repositories in the mix: the main public project, and the fork where you push your changes.

It's a matter of taste to some extent, but what I like to do is make the main project the remote "origin" of my working copy, and use my fork as a second remote called "fork". This makes it easy to keep track of what's happening in the main project because all I have to do is

# git fetch origin

and all the changes are available locally. It also means that I never get confused when I do my natural git workflow

# git checkout master
# git pull --rebase
... build, test, install etc ...

which always brings me up to date with the main project. I can keep my fork in sync with the main project simply by doing this after a pull from master:

# git push fork

Initial Set Up

Let's create a simple "remote" repo to work with in a sandbox. Instead of using a git service provider we'll just do it locally in your filesystem (using UN*X commands as an example).

$ rm -rf repo fork work
$ git init repo
$ (cd repo; echo foo > foo; git add .; git commit -m "initial"; git checkout `git rev-parse HEAD`)

(The last checkout there was to leave the repository in a detached head state, so we can later push to it from a clone.) From now on, pretend "repo" is a public github project (e.g. git://github.com/SpringSource/repo.git).

The "fork" URL in this clone command would be something like [email protected]/myuserid/repo.git. Now we'll create the fork. This is equivalent to what github does when you ask it to fork a repository:

$ git clone repo fork
$ (cd fork; git checkout `git rev-parse HEAD`)

Finally we need to set up a working directory where we make our changes (remember "repo" = git://github.com/SpringSource/repo.git):

$ git clone repo work
$ cd work
$ git checkout origin/master

Because we cloned the main public repo that is by default the remote "origin". We are going to add a new remote so we can push our changes:

$ git remote add fork ../fork
$ git fetch fork
$ git push fork

The local repository now has a single commit and looks something like this in gitk (or your favourite git visiualization tool):

A (origin/master, fork/master, master)

In this diagram, "A" is the commit label, and in brackets we list the branches associated with the commit.

Get the Latest Stuff

You can always get the latest stuff from the main repo using

# git checkout master
# git pull --rebase

and sync it with the fork

# git push fork

If you operate this way, keeping master synchronized between the main repo and your fork as much as possible, and never making any local changes to the master branch, you will never have any confusion about where the rest of the world is. Also, if you are going to send multiple pull requests to the same public project, they will not overlap each other if you keep them separate on their own branches (i.e. not on master).

The Pull Request

When you want to start work on a pull request, start from a master branch fully up to date as above, and make a new local branch

$ git checkout -b mynewstuff

Make changes, test etc:

$ echo bar > bar
$ echo myfoo > foo
$ git add .
$ git commit -m "Added bar, edited foo"

and push it up to your fork repository with the new branch name (not master)

$ git push fork mynewstuff

If nothing has changed in the origin, you can send a pull request from there.

What if the Origin Changes?

For the purpose of this tutorial we simulate a change in the origin like this:

$ cd ../repo
$ git checkout master
$ echo spam > spam; git add .; git commit -m "add spam"
$ git checkout `git rev-parse HEAD`
$ cd ../work

Now we're ready to react to the change. First we'll bring our local master up to date

$ git checkout master
$ git pull
$ git push fork

The local repository now looks like this:

A -- B (mynewstuff, fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master)

Notice how your new stuff does not have origin/master as a direct ancestor (it's on another branch). This makes it awkward for the project owner to merge your changes. You can make it easier by doing some of the work yourself locally, and pushing it up to your fork before sending the pull request.

Re-writing History on your Branch

If you aren't collaborating with anyone on your branch it should be absolutely fine to rebase onto the latest changes from the remote repo and force a push:

# git checkout mynewstuff
# git rebase master

The rebase might fail if you have made changes that are incompatible with somehting that happened in the remote repo. You will want to fix the conflicts and commit them before moving on. This makes life difficult for you, but easy for the remote project owner because the pull requiest is guaranteed to merge successfully.

While you are re-writing history, maybe you want to squash some commits together to make the patch easier to read, e.g.

# git rebase -i HEAD~2
...

In any case (even if the rebase went smoothly) if you have already pushed to your fork you will need to force the next push because it has re-written history (assuming the remote repo has changed).

# git push --force fork mynewstuff

The local repository now looks like this (the B commit isn't actually identical to the previous version, but the difference isn't important here):

A -- D (master, fork/master, origin/master) -- B (mynewstuff, fork/mynewstuff)

Your new branch has a direct ancestor which is origin/master so everyone is happy. Then you are ready to go into github UI and send a pull request for your branch against repo:master.

What if I want to Keep my Local Commits?

If you committed your changes locally in multiple steps, maybe you want to keep all you little bitty commits, and still present your pull request as a single commit to the remore repo. That's OK, you can create a new branch for that and send the pull request from there. This is also a good thing to do if you are collaborating with someone on your feature branch and don't want to force the push.

First we'll push the new stuff to the fork repo so that our collaborators can see it (this is unnecessary if you want to keep the changes local):

$ git checkout mynewstuff
$ git push fork

then we'll create a new branch for the squashed pull request:

$ git checkout master
$ git checkout -b mypullrequest
$ git merge --squash mynewstuff
$ git commit -m "comment for pull request"
$ git push fork mypullrequest

Here's the local repository:

A -- B (mynewstuff, fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master) -- E (mypullrequest, fork/mypullrequest)

You are good to go with this and your new branch has a direct ancestor which is origin/master so it will be trivial to merge.

If you weren't collaborating on the mynewstuff branch, you could even throw it away at this point. I often do that to keep my fork clean:

# git branch -D mynewstuff
# git push fork :mynewstuff

Here's the local repo, fully synchronized with both of its remotes:

A -- D (master, fork/master, origin/master) -- E (mypullrequest, fork/mypullrequest)

Continue Working on your New Stuff

Let's say your pull request is rejected and the project owner wants you to make some changes, or the new stuff turns into something more interesting and you need to do some more work on it.

If you didn't delete it above, you can continue to work on your granular branch...

$ git checkout mynewstuff
$ echo yetmore > foo; git commit -am "yet more"
$ git push fork

and then move the changes over to the pull request branch when you are ready

$ git rebase --onto mypullrequest master mynewstuff

All the changes we want are in place now, but the branches are on the wrong commits. As you can see below, mynewstuff is where I want mypullrequest to be, and the remote fork/mynewstuff doesn't have a corresponding local branch:

A -- B -- C (fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master) -- E (mypullrequest, fork/mypullrequest) -- F (mynewstuff)

We can use git reset to switch the two branches to where we want them (you can probably do this is a graphical UI if you like):

$ git checkout mypullrequest
$ git reset --hard mynewstuff
$ git checkout mynewstuff
$ git reset --hard fork/mynewstuff

and the new repository looks like this:

A -- B -- C (mynewstuff, fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master) -- E (fork/mypullrequest) -- F (mypullrequest)

If we are OK with the pull request being 2 commits, we can just push it as it is:

$ git checkout mypullrequest
$ git push fork

The endpoint looks like this:

A -- B -- C(mynewstuff, fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master) -- E -- F (mypullrequest, fork/mypullrequest)

Or we could rebase it to squash the commits together and force the push, scematically:

# git rebase -i HEAD~2
...
# git push --force fork

Because origin/master is a direct ancestor of fork/mypullrequest I know that my pull request will be trivial to merge.

Wrap Up

Hopefully this tutorial has given you enough git ammunition to go ahead and make some changes to your favourite open source project and be confident that the merge will be easy. Remember there is always more than one way to do it, and git is a powerful, low-level tool, so your mileage my vary and you might find variants of the approach above preferable or even necessary, depending on your changes.

This week in Spring: July 12th, 2011

Engineering | Josh Long | July 13, 2011 | ...

Welcome back to another installment of "This Week in Spring." Today saw a new sunrise, and - more importantly - the release of vSphere 5, the next step in cloud infrastructure!

My head's still buzzing after the excitement that accompanied this morning's launch.
This - and the recent release of vFabric 5 - represent the next stage in cloud innovation, and a huge part of taking your applications to production, and to the cloud, with Spring.

    <LI>O'Reilly has published a fantastic roundup on the <a href = "http://radar.oreilly.com/2011/07/7-java-projects.html">seven Java projects that <EM…

Countdown to Grails 2.0: Static resources

Engineering | Peter Ledbrook | June 30, 2011 | ...

Web applications typically rely heavily on what we call static resources, such as Javascript, CSS and image files. In a Grails application, they are put into a project's web-app directory and then referenced from the HTML. For example,

<link rel="stylesheet" href="${resource(dir: 'css', file: 'main.css')}" type="text/css">

will create a link to the file web-app/css/main.css. All very straightforward. You might even think that the current support is more than sufficient for anyone's needs. What else would you want to do?

That's a good point. The answer depends on the complexity of your application, but let's start with the example CSS link above. Why do we have to type out the <link rel="..." href=...>? Just by looking at the extension, we know that the resource is a CSS file. We also know that CSS files should be linked into an HTML page using the…

This week in Spring: June 28th, 2011

Engineering | Josh Long | June 29, 2011 | ...

Welcome back to another installment of "This Week in Spring."

Lots of great stuff this week, as usual. When we compile this list, we trawl the internet looking for interesting stuff and try to bring it to you, digest style, in this weekly roundup. Some of the resources that we commonly check are Twitter, the SpringSource blogs, CloudFoundry.org, and Tomcat Expert,

We try to not miss anything, but we might. If you know of something that we've missed or think should be included, don't hesitate to ping your humble editors with any suggestions.

While SpringSource has a strong presence at numerous conferences and industry events, the premiere conference for Spring developers remains the SpringOne conference, held yearly in the United States. Work is well underway in planning the final program. Check out the SpringOne 2GX page to see news and activity, and to register, for the upcoming SpringOne2GX conference.

    <LI><a href="http://www.springsource.org/spring-social/news/1.0.0.rc1-released">Spring Social 1.0.0.RC1</a…

This week in Spring: June 21st, 2011

Engineering | Josh Long | June 22, 2011 | ...

Welcome back to yet another This Week in Spring. SpringSource is out in full force at JAX San Jose this week and we will be at OSCON, in July. These events are great avenues for us to connect with the userbase. As usual, we've got a nice complement of stuff to cover this week, so let's get to it!

          <LI>  There has been loads of interest and discussion surrounding last week's <a href="http://blog.springsource.com/2011/06/09/spring-framework-3-1-m2-released/">Spring 3.1 second milestone</a>.  Sam Brannen writes about the <a href="http://blog.springsource.com/2011/06/21/spring-3-1-m…

Spring 3.1 M2: Testing with @Configuration Classes and Profiles

Engineering | Sam Brannen | June 21, 2011 | ...

As Jürgen Höller mentioned in his post announcing the release of Spring 3.1 M2, the Spring TestContext Framework(*) has been overhauled to provide first-class testing support for @Configuration classes and environment profiles.

In this post I'll first walk you through some examples that demonstrate these new testing features. I'll then cover some of the new extension points in the TestContext framework that make these new features possible.

      Please note: this is a cross post from my company blog www.swiftmind.com.

Background

In Spring 2.5 we introduced the Spring TestContext Framework which provides annotation-driven integration testing support that can be used with JUnit or TestNG. The examples in this blog will focus on JUnit-based tests, but all features used here apply to TestNG as well.

At its core, the TestContext framework allows you to annotate test classes with @ContextConfiguration to specify which configuration files to use to load the ApplicationContext for your test. By default the ApplicationContext is loaded using the GenericXmlContextLoader which loads a context from XML Spring configuration files. You can then access beans from the ApplicationContext by annotating fields in your test class with @Autowired, @Resource, or @Inject

Defining the Future for Virtualized and Cloud Java

Engineering | Rod Johnson | June 14, 2011 | ...

Today I am proud to announce version 5 of our VMware vFabric™ application platform defining the future of enterprise Java for cloud and virtualized execution environments. vFabric blazes the path to new and modern cloud architectures by providing a modern programming model paired with next-generation platform services. A path that is not overgrown with the cruft and complexity of prior-generation technologies. With vFabric 5, VMware is ensuring that enterprise Java is ready to meet the challenges of tomorrow’s demanding, data-intensive, massively scalable applications.

vFabric 5 continues to provide the best place to run your Spring applications with vFabric tc Server and the ability to monitor and manage those production solutions with incredible intelligence via vFabric Hyperic. The platform also addresses the technical challenges of cloud computing head on, supporting new approaches to data management that enable applications to scale across elastic, geographically distributed cloud architectures with our vFabric GemFire and RabbitMQ

This week in Spring: June 14th, 2011

Engineering | Josh Long | June 14, 2011 | ...

Welcome back to another installment of "This Week in Spring," and what a week it's been! This last week saw the release of the Spring 3.1 M2 and vFabric 5! Lots of exciting stuff to talk about there, as well as general community news, so let's get to it!

  1. Today VMware announced the release of VMware vFabric 5, the application platform that defines the future of enterprise Java for cloud and virtualized execution environments. vFabric 5 contains many of the technologies that the Spring community is already familiar with including tc Server, Hyperic, GemFire, and RabbitMQ, but now adds some new technology.
    • Elastic Memory for Java (EM4J): a new capability for tc Server that provides a completely new level of coordination between the application server and the underlying virtual machine. EM4J uses the underlying vSphere virtualization to overcome some of the limitations of the Java's static memory heap.
    • Spring Insight Operations: leverages the same code-level tracing technology from the Spring Insight project but pulls together information from multiple application servers into a single console with roll-up views, drill downs, and historical comparisons ready for production systems.
    • SQLFire: vFabric SQLFire leverages the time-tested vFabric GemFire underpinnings providing data at memory speed and horizontal scale but vFabric SQLFire adds familiar and standard SQL and JDBC interfaces to the service.

    Rod Johnson discusses all the details of the release in his latest blog. Be sure to check out the latest release and try it out.

  2. Spring core lead Juergen Hoeller has announced that Spring 3.1.0 M2 has been released! At long last, the next step on the steady march to Spring 3.1 GA! The new release is as feature-packed as the last one, with a long list of major new features including (but definitely not limited to!) improved Java configuration support, XML-free and hassle-free Servlet 3.0-based Spring MVC application bootstrapping, new Builder APIs for JPA and Hibernate, and much, much more! Check out the release announcement here and get the bits from your build dependency management tool of choice or the download page
  3. <LI> Hot on the heels of the Spring 3.1 release announcement, <a href="http://blog.springsource.com/2011/06/10/spring-3-1-m2-configuration-enhancements/">Chris Beams chimes in</a> on the much-improved Java-centric configuration model in Spring 3.1, M2, even as compared to M1! The features are really starting to come together to make this one of the smoothest, well arranged releases, yet! </LI> 
    
    <lI> 
    

    Spring 3.1 M2 represents a marked improvement in core Spring, as well as Spring MVC! Rossen Stoyanchev chimes in to introduce the numerous (truly, you'll need to read the detailed blog to…

Spring 3.1 M2: Spring MVC Enhancements

Engineering | Rossen Stoyanchev | June 13, 2011 | ...

This post focuses on what's new for Spring MVC in Spring 3.1 M2. Here are the topics:

  • Code-based equivalent for the MVC namespace.
  • Customizable @MVC processing.
  • Programming model improvements.

A brief reminder that the features discussed here are in action at the Greenhouse project.

Code-based Configuration For Spring MVC

As Chris pointed out in his blog post last Friday, XML namespaces cut down configuration dramatically but also reduce transparency and sometimes flexibility. This holds true for the MVC namespace, which supports a number of customizations but not everything that's available. That means you are either able to use it or otherwise leave it. We believe code-based configuration has a solution for that and a path from simple to advanced.

Let's begin with this simple, familiar snippet:


<mvc:annotation-driven />

Although not required for using annotated controllers, <mvc:annotation-driven> does a number of useful things -- it detects the presence of a JSR-303 (Bean Validation) implementation and configures data binding with it, it adds a JSON message converter if Jackson JSON library is available, and a few other things that can save quite a bit of configuration.

Now let's match that with code-based configuration:


@Configuration
@EnableWebMvc
public class WebConfig {
}

Here @EnableWebMvc imports an @Configuration class that matches the goodness of <mvc:annotation-driven>. As simple as that.

The next step is to use an attribute in <mvc:annotation-driven> perhaps to provide a FormattingConversionService, or to add a sub-element perhaps configuring message converters, or to use other MVC namespace elements like <mvc:interceptors>, <mvc:resources>, etc.

Let's see how to do all of that in code-based configuration:


@Configuration
@EnableWebMvc
public class WebConfig extends WebMvcConfigurerAdapter {

    @Override
    public void addFormatters(FormatterRegistry registry) {
        // register converters and formatters...
    }

    @Override
    public void configureMessageConverters(List<HttpMessageConverter<?>> converters) {
        // add message converters...
    }

    @Override
    public void configureInterceptors(InterceptorConfigurer configurer) {
        configurer.addInterceptor(new…

Spring 3.1 M2: Configuration Enhancements

Engineering | Chris Beams | June 10, 2011 | ...

As Juergen mentioned in his post yesterday, and as I've mentioned in my previous posts on 3.1 M1, one of the major themes of Spring 3.1 is completing our vision for code-based configuration in Spring. We think a modern enterprise Java application should have a choice between Java and XML as first class options for its configuration. In this post we'll see how Spring 3.1 M2 helps make this a reality.

Note that although Java-based configuration has been available since Spring 3.0, with this release it is now on par with many more of the XML-based features that have been developed over the…

Get the Spring newsletter

Stay connected with the Spring newsletter

Subscribe

Get ahead

VMware offers training and certification to turbo-charge your progress.

Learn more

Get support

Tanzu Spring offers support and binaries for OpenJDK™, Spring, and Apache Tomcat® in one simple subscription.

Learn more

Upcoming events

Check out all the upcoming events in the Spring community.

View all