Getting involved

Github Issues

If you've got a change or request or issue, it might be easiest to just add an issue at Github. Issues usually get resolved pretty quickly into a snapshot build.

This is in no way meant to discourage you from if you're still interested, read on!

Patches and Pull Requests

Patches and Pull Requests are great ways to submit code, and using Github issues is the preferred mechanism for discussing development issues (so everyone can see it and weigh in if they care).

It is strongly recommended to submit smaller changes that are easily pulled in and merged...major overhauls of huge chunks of code are not likely to be pulled in without some serious discussion and planning.

Conventions, Norms, and Design Goals

gedcom4j has a number of design goals, conventions, and norms that are in play that guide the direction of the library. Knowing these will help you write code that is consistent with the codebase and make incorporation of your changes easy and seamless. If you don't agree with this material, that's fine: make the case why it should be otherwise and things can be changed!

Adherance to the GEDCOM 5.5 and 5.5.1 spec are paramount.
It is critical that things work in accordance to both the 5.5 and 5.5.1 specs. If you are not familiar with the specs, becoming so is highly recommended. It's particularly important that gedcom4j always writes out spec-compliant files. Relaxing things on the parser side is ok so long as it doesn't lead towards writing out bad GEDCOM files. Dealing gracefully with improperly-formed GEDCOM files from other tools is a very good thing to offer when possible. This also requires knowing and accommodating the many weaknesses of the spec (such as text-based indeterminate dates).
No runtime dependencies
The library must not require any other libraries at runtime that users of gedcom4j have to include in their applications. This includes logging tools like log4j, at this time. Compile-time dependencies are ok.
Version N-1 support for Java
For a long time, gedcom4j tried to maintain support for the oldest available versions of Java, in acknowledgement that not everyone can use the latest version of Java. Upon further reflection, it became apparent that anyone who is using gedcom4j is not being constrained by any organizational standards committee that won't allow users to use the more modern JDK's. So now the plan is to target language features from one version back on Java - at this writing, this means JDK 7.
No writing to System.out or System.err at runtime
Writing to System.out or System.err during Junit test execution is ok, but if gedcom4j writes to System.out or System.err, that makes it impossible for consumers of gedcom4j to use them for their output without gedcom4j's "noise" getting blended in.
All code must have a unit test
Ideally, you'd write the test first, then write the code, but 100% class coverage is a must, and 70%+ line coverage is a must. All unit tests must pass.
All code must have complete and well-formed javadoc
All methods, fields, classes, etc must have javadoc, with no missing tags, and descriptive text for each tag. The compile must issue no Javadoc warnings.
Language neutrality and deliberate absence of UI
The primary author of gedcom4j is American, and speaks only English, and is aware of the biases that introduces to the code. The GEDCOM spec itself is euro-centric and biased towards English speakers as well. That said, a goal of gedcom4j is to remain as language-neutral as possible, and this includes avoiding String-based values generated from the library that need to be interpreted by the caller, or might not be appropriate for a UI in a non-English language. In fact, avoiding UI-like features is a deliberate design choice: gedcom4j should have an API, not a UI. The presentation tier should be solely the concern of the user of the library, and the library should make it easy for consumers to do what they want in their UI.
Member sorting
Member sort order should be Inner types; then static fields, initializers , and methods; then regular fields, initializers, constructors, and methods. Within these categories, members should be sorted in Public, Protected, Package, and Private order. Finally, within those orders, members should generally be sorted in alphabetical order.

Design debates and current positions

The following topics come up repeatedly as potential approaches, and discussions go back-and-forth. Here's where things stand currently.

Visitor pattern
The Visitor pattern has been repeatedly evaluated, attempted, and abandoned for use in the parser, validator, and writers. The main reasons for not using Visitor pattern here are:
  1. Visitor is complex, but this alone is not reason enough to avoid it. GEDCOM parsing is complex too. Implementing it must be worth the trouble, however.
  2. Visitor requires an interface and implementations that have a visit method for each class in the data model. Currently this is over 50 classes...a lot of methods to implement for a visiting class.
  3. Visitor is best used over a stable and unchanging data model. The model classes have been changing and each change would have significant secondary effects on the visitors.
  4. Visitor is best used when there are multiple operations that need to visit the object model. gedcom4j has two use cases for visiting the data model: writing files and validating data. Admittedly, users of gedcom4j could conceivably write their own visitors over the data model if the interfaces were publicly visible, and is the best argument in favor of implementing Visitor.
  5. Visitor deals with all the visits but does not (by itself) traverse the object graph...nor would most users actually want to visit the entire object model. Most users are interested in a particular type of data (most likely Individuals and Families) and aren't really wanting to write code to traverse through and visit Repositories and Sources and Submitters. If you're interested in a specific type of data, Visitor is a lot of overhead to ignore.

In light of the above, it just doesn't seem worth the trouble at this time. That said, Visitor is a great design pattern and the above in no way is meant to dismiss it as a useful pattern in the right circumstances. There are often several strong (near-religious) opinions on this, so it also seems likely the topic will be re-evaluated at some point in the future.

Place validation and Geolocation
While place name validation and geolocation are valuable and cool, they have two main problems in the context of gedcom4j:
  1. They would require runtime dependency on a third party solution of some kind.
  2. Most geo-location data is based only on current geographic boundaries and do not address historical changes that are beyond what gedcom4j can reasonably accomplish.


The author uses Eclipse, but that's not strictly necessary. If you do use Eclipse though, you are encouraged to incorporate the following items and configurations:

  • The Eclipse .project, .configuration, and .settings metadata are checked into Github and should make importing into your workspace easy.
  • PMD, Checkstyle, and Findbugs plugs - the configurations for these tools are checked into Github along with the source.
  • There is a preference file in the etc directory that includes the Eclipse code formatter settings that should be used
  • Code is supposed to work under Windows, *nix, and Mac. The project owner uses a Mac, so if there are biases or incompatibilities there, please fix them or log an issue at Github.