Susan Potter

Tracking diffs by scoping to file, range, function, method, or class changes in Git

Tue December 12, 2020

One common question I see from developers using Git is how they can review the history of one function, method, or class over time through Git's history of the project.

In codebases that have evolved over years, a developer just wants to know how one particular semantic scope of code has changed over time rather than on a file or directory basis.

We will start out by revisiting how to scope change diffs per file.

We will use the Ruby language repository to demonstrate the commands in this blog post, so please clone the repository like so:

$ git clone https://github.com/ruby/ruby.git
Cloning into 'ruby'...
remote: Enumerating objects: 91, done.
remote: Counting objects: 100% (91/91), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 488812 (delta 25), reused 63 (delta 7), pack-reused 488721
Receiving objects: 100% (488812/488812), 229.77 MiB | 2.78 MiB/s, done.
Resolving deltas: 100% (375083/375083), done.

Scoping log diffs per file

Sometimes a developer only wants to look at changes in one specific file in the repository. To do this we would use the git-log command:

$ git log -- README.md
commit 459670d47f8528db8f5d4f28aeac191b1af66d81
Author: David Rodríguez <deivid.rodriguez@riseup.net>
Date:   Sun Mar 8 10:21:18 2020 +0100

    Fix bundled gems installation on a fresh clone

commit adc303131187654d8ce83f3db17eefa3d5bae26c
Author: Kazuhiro NISHIYAMA <zn@mbf.nifty.com>
Date:   Sat Feb 1 00:36:58 2020 +0900

    README*.md: `defines.h` moved [ci skip]

    at 2b592580bf65040373b55ff2ccc3b59a0a231a18

commit 2d61684e7c334ae4c5eb845c782d5fabeffdea67
Author: Nobuyoshi Nakada <nobu@ruby-lang.org>
Date:   Sun Jan 19 21:15:23 2020 +0900

    README.md: removed the badge for Cygwin [ci skip]

    The workflow for Cygwin has been removed at
    3344f811074e1e6119eec23684013457dab4f8b0.

commit 1a1862236da60e21e51c66543e89bf577b6ed14a
Author: Kazuhiro NISHIYAMA <zn@mbf.nifty.com>
Date:   Wed Jan 1 00:02:01 2020 +0900

    Update GitHub Actions Badges

[TRUNCATED]

This will show only the log message and metadata about commits that contain changes in that file.

Scoping diffs in a line range of a file

In many projects each source file has a predefined documentation header and we only want to find the change that introduced an inconsistency in the documentation header of a particular file.

To find this we might do the following in our ruby repository:

$ git log -L 1,9:vm.c
commit 79df14c04b452411b9d17e26a398e491bca1a811
Author: Koichi Sasada <ko1@atdot.net>
Date:   Tue Mar 10 02:22:11 2020 +0900

    Introduce Ractor mechanism for parallel execution

    This commit introduces Ractor mechanism to run Ruby program in
    parallel. See doc/ractor.md for more details about Ractor.
    See ticket [Feature #17100] to see the implementation details
    and discussions.

    [Feature #17100]

    This commit does not complete the implementation. You can find
    many bugs on using Ractor. Also the specification will be changed
    so that this feature is experimental. You will see a warning when
    you make the first Ractor with `Ractor.new`.

    I hope this feature can help programmers from thread-safety issues.

diff --git a/vm.c b/vm.c
--- a/vm.c
+++ b/vm.c
@@ -1,9 +1,9 @@
 /**********************************************************************

-  vm.c -
+  Vm.c -

   $Author$

   Copyright (C) 2004-2007 Koichi Sasada

 **********************************************************************/

commit 6cdef2dc7e8a4098727de5befff8b2496fa71430
Author: akr <akr@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date:   Sun Jan 6 15:49:38 2008 +0000

    * $Date$ keyword removed to avoid inclusion of locale dependent
      string.


    git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14912 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

diff --git a/vm.c b/vm.c
--- a/vm.c
+++ b/vm.c
@@ -1,10 +1,9 @@
 /**********************************************************************

   vm.c -

   $Author$
-  $Date$

   Copyright (C) 2004-2007 Koichi Sasada

[TRUNCATED]

This will show all commits containing changes in lines 1 through 9 inclusive in the file vm.c along with patch diff output for that part of the file.

Scoping diffs by named block in a file

In large files or when blocks (such as functions, methods, or classes) of code have been moved around the file, we might want to limit change log noise especially when that file is updated regularly. A typical example in a Ruby on Rails application might be an action method in a controller.

Let's consult the man page for git-log like so:

$ man git-log

We eventually come across a part like the following:

      -L <start>,<end>:<file>, -L :<funcname>:<file>
          Trace the evolution of the line range given by "<start>,<end>" (or the
          function name regex <funcname>) within the <file>. You may not give any
          pathspec limiters. This is currently limited to a walk starting from a
          single revision, i.e., you may only give zero or one positive revision
          arguments, and <start> and <end> (or <funcname>) must exist in the starting
          revision. You can specify this option more than once. Implies --patch. Patch
          output can be suppressed using --no-patch, but other diff formats (namely
          --raw, --numstat, --shortstat, --dirstat, --summary, --name-only,
          --name-status, --check) are not currently implemented.

          <start> and <end> can take one of these forms:

          •   number

              If <start> or <end> is a number, it specifies an absolute line number
              (lines count from 1).

          •   /regex/

              This form will use the first line matching the given POSIX regex. If
              <start> is a regex, it will search from the end of the
              previous -L range, if any, otherwise from the start of file. If <start>
              is “^/regex/”, it will search from the start of file. If
              <end> is a regex, it will search starting at the line given by <start>.

          •   +offset or -offset

              This is only valid for <end> and will specify a number of lines before
              or after the line given by <start>.

          If “:<funcname>” is given in place of <start> and <end>, it is a regular
          expression that denotes the range from the first funcname line that matches
          <funcname>, up to the next funcname line.  “:<funcname>” searches from the
          end of the previous -L range, if any, otherwise from the start of file.
          “^:<funcname>” searches from the start of file.

Ok, we have already seen how to list the relevant log entries with patches for a line range in a file (in the section above) and now want to take advantage of the form -L :<funcname>:<file>.

To look at all changes in the main function of the ext/nkf/nkf-utf8/nkf.c file in the ruby repository we would issue the following command:

$ git log -L :main:ext/nkf/nkf-utf8/nkf.c

Cool, then armed with this new power we should be able to look at commits and relevant patches within a Ruby function too, right? Let's give that a try:

$ git log -L :request_uri:lib/uri/http.rb
commit 107ba65fba13bdf791e5dae0305c5768e6f7d122
Author: hsbt <hsbt@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date:   Fri Sep 30 10:06:24 2016 +0000

    * lib/uri/http.rb: Documentation and code style imrovements.
    * test/uri/test_http.rb: Added test for coverage.
      [fix GH-1427][ruby-core:77255][Misc #12756]

    git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@56298 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

diff --git a/lib/uri/http.rb b/lib/uri/http.rb
--- a/lib/uri/http.rb
+++ b/lib/uri/http.rb
@@ -98,12 +102,11 @@
     def request_uri
-      return nil unless @path
-      if @path.start_with?(?/.freeze)
-        @query ? "#@path?#@query" : @path.dup
-      else
-        @query ? "/#@path?#@query" : "/#@path"
-      end
+      return unless @path
+
+      url = @query ? "#@path?#@query" : @path.dup
+      url.start_with?(?/.freeze) ? url : ?/ + url
     end
   end

   @@schemes['HTTP'] = HTTP
+
 end

commit a5c923f6c1ab0ddd68c4debb7c68623ff0cf4e6a
Author: naruse <naruse@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date:   Tue Aug 5 19:09:01 2014 +0000

    * lib/uri/http.rb (URI::HTTP#request_uri): optimized.
      decrease object allocation, and ensure always create at least one new
      object for return value.

    git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@47072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

diff --git a/lib/uri/http.rb b/lib/uri/http.rb
--- a/lib/uri/http.rb
+++ b/lib/uri/http.rb
@@ -95,12 +95,12 @@
     def request_uri
-      r = path_query
-      if r && r[0] != ?/
-        r = '/' + r
+      return nil unless @path
+      if @path.start_with?(?/.freeze)
+        @query ? "#@path?#@query" : @path.dup
+      else
+        @query ? "/#@path?#@query" : "/#@path"
       end
-
[TRUNCATED]

This works but you will notice some of the patches show changed lines outside of the method block.

How does this work?

One key observation is that in the root of the ruby repository is a file named .gitattributes. This can do many things but for the purposes of block-based git logs and patch review, the important line that made the above command mostly work is the following:

*.rb diff=ruby

This is telling Git to assume the file type for all files ending in and rb extension is ruby. For diffing purposes this uses a regex to determine the block boundaries for ruby files:

PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
   /* -- */
   "(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
   "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
   "|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),
Snippet of Git source code from userdiff.c

This identifies named class, module, function, or method definitions as being named blocks. The start of the regular expression provides looks for spaces or tabs preceding either a class, module, or def keyword followed by a space or tag again.

The way the -L :funcname:file argument to git log subcommand works is it will find the named marker matching that regex until the next named marker and this is why we don't just see changes within the request_uri method definition in the example in the parent section.

For most purposes this is good enough for quick and dirty filtering of noise from git logs.

Tracking changes in markdown document sections

Now let us say we want to see a log of all commits that changed the section 'Features of Ruby' in the README.md file at the root of the ruby repository.

Let us give that a try:

$ git log -L :Features\ of\ Ruby:README.md

This gives me a rather nasty error like so:

fatal: -L parameter 'Features of Ruby' starting at line 1: no match

Not the best error message but based on the last subsection ('How does this work?') I have a hunch. Let's find where in the .gitattributes that it specifies that README.md is a markdown file:

$ grep markdown .gitattributes

It shows me nothing. We need to tell Git to assume that all *.md files are of type markdown which we can do by adding the following line:

*.md diff=markdown

Retrying the git log command above will show us only commits and their patches that contain changes to that section of the markdown file README.md as expected now:

$ git log -L :Features\ of\ Ruby:README.md
commit dbe834ab5ac4f90df5db9fc314b45890726cca3b
Author: Takashi Kokubun <takashikkbn@gmail.com>
Date:   Mon Jul 1 01:04:40 2019 +0900

    Prefer master rather than trunk in README [ci skip]

diff --git a/README.md b/README.md
--- a/README.md
+++ b/README.md
@@ -13,15 +13,15 @@
 ## Features of Ruby

 *   Simple Syntax
 *   **Normal** Object-oriented Features (e.g. class, method calls)
 *   **Advanced** Object-oriented Features (e.g. mix-in, singleton-method)
 *   Operator Overloading
 *   Exception Handling
 *   Iterators and Closures
 *   Garbage Collection
 *   Dynamic Loading of Object Files (on some architectures)
 *   Highly Portable (works on many Unix-like/POSIX compatible platforms as
     well as Windows, macOS, Haiku, etc.) cf.
-    https://github.com/ruby/ruby/blob/trunk/doc/contributing.rdoc#platform-maintainers
+    https://github.com/ruby/ruby/blob/master/doc/contributing.rdoc#platform-maintainers



commit 4fb5888a4dbc10b6f6d3f847f680baae60b9f757
Author: kazu <kazu@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date:   Fri Jun 15 00:19:05 2018 +0000

    Update obsoleted URLs of supported platforms [ci skip]

    git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@63666 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

diff --git a/README.md b/README.md
--- a/README.md
+++ b/README.md
@@ -11,15 +11,15 @@
 ## Features of Ruby

 *   Simple Syntax
 *   **Normal** Object-oriented Features (e.g. class, method calls)
 *   **Advanced** Object-oriented Features (e.g. mix-in, singleton-method)
 *   Operator Overloading
 *   Exception Handling
 *   Iterators and Closures
 *   Garbage Collection
 *   Dynamic Loading of Object Files (on some architectures)
 *   Highly Portable (works on many Unix-like/POSIX compatible platforms as
     well as Windows, macOS, Haiku, etc.) cf.
-    https://bugs.ruby-lang.org/projects/ruby-trunk/wiki/SupportedPlatforms
+    https://github.com/ruby/ruby/blob/trunk/doc/contributing.rdoc#platform-maintainers



commit f4ae225b04ae0cde3aa2781c82875074da49086b
[TRUNCATED]

Defining new named blocks for new formats and file types

Now what happens if I wanted to write my documentation in orgmode format instead of markdown like all good emacsers?

Let us try the following:

  1. We will add an entry to .gitattributes file to tell Git to treat files matching the pattern *.org as org files.

  2. Write orgmode files over multiple commits changing parts of different sections.

  3. Try the git log -L :<funcname>:<filename> command like above.

Unfortunately this alone will not work. What we must also do is open up our user ~/.gitconfig and add the following to the =[diff "org"] section.

[diff "org"]
  xfuncname = "^ *\*{1,6}[ \t].*"

Now if we try it we will see what we are looking for.

As an exercise you could try building a regular expression for a file format that git doesn't automatically recognize how to find named blocks for and adding the xfuncname attribute under the relevant diff configuration section of your Git config file.

Limitations

One big limitation of this last approach approach is that it is based on the name of the block given by the regular expression in xfuncname in the relevant diff config section. It means that if the name of the block changed over time that will not be included in the output.

Two related options for git-log includes:

  • -S <TERM>: which searches for the specified string in the patch

  • -G <REGEX>: which searches for the regular expression in the patch

I have the following git aliases defined for each:

[alias]
  # ... truncated
  search = "log --all --pretty=oneline -S"
  egrep  = "log --all --pretty=oneline -G"

Then I can use git egrep "^\s*module\s+" to search for all commits that contain something that resembles a module declaration in Ruby.

Again note that this is just a quick-n-dirty way to eliminate noise and for many use cases this is enough, but we should dream about a more semantic world.