Import Patterns

This is the guide for developing Import Patterns.  If you’re looking for the list of premade patterns, please see our Import Patterns Category.

Notes pattern

Import patterns are designed using Regular Expressions (or regex). Because this pattern is determined primarily by your Kindle software version, one of the built in packages should work for you.

However, some users will upload personal documents or other files that require a modified Import Pattern.  For example, the Kindle uses the pattern “Title (Author)” for standard documents.  Personal documents, however, use the file name (minus the extension) when exporting to “My Clippings”.  In some cases, the author field will be incorrectly associated with something else in parenthesis.

A variety of Regular Expression tutorials can be found on the Internet so we will not provide a detailed tutorial on the process.  Instead, we have provided several links at the bottom of this article.

However, when designing an Import Patter, you should be aware that the software will only match and extract a small number of named groups:

  • <Book>
  • <Author>
  • <Type>
  • <Page>
  • <Location>
  • <Date>
  • <Text>

Date format

As of v0.6, the system supports two different date matching patterns:

  • The original and default is the pattern found in QDateTime.  Details of that pattern can be found in the documentation for Qt.  Unfortunately, Qt will not match the short year ‘yy’ to years after 1999.  After reporting the bug to Qt, the author was told that they do not intend to fix it (inconceivable more than a decade into the 21st century).
  • To ensure that we could support short years, v0.6 added support for Python’s datetime function strptime() and it’s associated pattern.  Documentation for this matching pattern can be found in Python’s documentation.
  • The software decides which pattern to use based on the presence of the % character.  If this character is present, the Python method is used.  If the character is absent, the Qt method is used.  If you have a pattern that includes a % character (for any reason), you must use the Python format (and the %% code that explicitly matches a % character).

Encoding

By default, we encourage users to select utf-8.  The system will automatically check utf-16 and windows-1252 if the configured encoding fails.  Selecting utf-8 catches any files in that format before falling back on the default methods.

The only reason to select another encoding is to address issues where utf-8 (and the built-in defaults) fail to correctly process your files.

Resources

As DaleyKlippings is written in Python, the Python Regular Expression Operations page provides the authoritative source on the operations supported by DaleyKlippings.