Jan 03

Import Pattern for Kindle Paperwhite 5.3.1

This pattern addresses another date-time issue.  A file from the Paperwhite was provided that include date-times in the following format:

Added on Monday, 23 April 12 22:51:41

While this is roughly the same date-time pattern found in the previous GMT example, it was necessary to make more significant changes to the expression around the Date tag.  In previous patterns we could depend on AM/PM or GMT to clearly indicate the last characters in the date-time.  This pattern has no obvious terminator.  Instead, we had the time tag include everything up to the end-line characters “\r\n”.  The regex pattern [^…] tells the system to include all characters up and until  the characters listed in the area indicated by the ellipsis.

Notes delimiter:

==========

Notes pattern:

# Import notes and highlights from "My Clippings.txt" and ignore
# bookmarks. Warnings with information on ignored bookmarks
# will be added to the log - this is the app normal behaviour

# Note that VERBOSE and UNICODE options are always on

^\s* #
(?P<Book>.*?) # Book name
(\s*\((?P<Author>[^\(]*)\))? # Author name (optional)
\s*-\ Your\ #
(?P<Type>(Highlight|Note|Bookmark)) # Clipping type - 'Highlight' or 'Note'
(\ on\ Page\ #
(?P<Page>[\d-]*)\ \|)? # Page (optional)
(.*(Location|Loc\.)\ #
(?P<Location>[\d-]*))? # Location (optional)
.*?Added\ on\ #
(?P<Date>([^\r\n]*)) # Date & time
\s* #
(?P<Text>.*?) # Text
\s*$ #

Date Format:

%A, %d %B %y %H:%M:%S

Encoding:

UTF-8 (all languages)
Dec 21

Import Pattern for Kindle 3

The original Kindle 3 pattern had a bug so this improved pattern was included in v0.7.  Note that this version excludes bookmarks.

Notes delimiter:

==========

Notes pattern:

# Import notes and highlights from "My Clippings.txt" and ignore
# bookmarks. Warnings with information on ignored bookmarks
# will be added to the log - this is the app normal behaviour
# Note that VERBOSE and UNICODE options are always on
^\s*                         #
(?P<Book>.*?)                # Book name
(\s*\((?P<Author>[^\(]*)\))? # Author name (optional)
\s*-\                        #
(?P<Type>(Highlight|Note))   # Clipping type - 'Highlight' or 'Note'
(\ on\ Page\                 #
(?P<Page>[\d-]*)\ \|)?       # Page (optional)
(.*(Location|Loc\.)\         #
(?P<Location>[\d-]*))?       # Location (optional)
.*?Added\ on\                #
(?P<Date>(.*)(AM|PM))        # Date & time
\s*                          #
(?P<Text>.*?)                # Text
\s*$                         #

Date Format – This field is left empty because the default matching pattern works with everything we tested.


Encoding – While we pick utf-8, most files aren’t encoded this way.  However, (as of v0.7) the system will automatically check utf-16 and windows-1252 if the configured encoding fails.  By selecting utf-8, we catch anything that happens to be encoded in utf-8 before falling back on utf-16 and windows-1252..

UTF-8 (all languages)
Dec 19

Import Pattern for Kindle 4 showing GMT, with bookmarks

Some notes about this pattern before we start. If you just want to use the pattern (included in the installer from v0.6 on), skip down to the notes delimiter header.

A user had a My Clippings file that include date/times in the following format:

Added on Monday, 23 April 12 22:51:41 GMT+01:00

This was a significant difference from the US Kindle Touch and, unfortunately, was unsupported by the library used to parse dates. My solution was to exclude the “GMT+01:00” part of the match on the assumption that DaleyKlippings would correct for it by using the user’s local time. Even if this isn’t true, the difference should be trivial in the greater scheme of things.

A few other differences are present (especially if you’re using these samples to learn):

  • This pattern excludes the phrase “Your ” (including the trailing space) in front of the word Note/Highlight/Bookmark
  • This pattern includes bookmarks by adding the phrase inside the Type tag, e.g. (Highlight|Note|Bookmark)
  • Location was presented as Loc. so this was added to the matching text as a valid option
  • Note that “GMT” is outside the (?P<Date>…) tag.  This ensures that it is properly matched by the regular expression, but excludes it from the tag.
  • Because we had a two-digit year and there is a bug in our standard date/time object (QDateTime, which uses this formatting), this code uses a pattern compatible with Python’s datetime object (see Python’s datetime object for full documentation).  The code should automatically recognize and use the right parser based on your date/time pattern.

Notes delimiter:

==========

Notes pattern:

# Import notes and highlights from "My Clippings.txt" and ignore
# bookmarks. Warnings with information on ignored bookmarks
# will be added to the log - this is the app normal behaviour

# Note that VERBOSE and UNICODE options are always on

^\s*                           #
(?P<Book>.*?)                  # Book name
(\s*\((?P<Author>[^\(]*)\))?   # Author name (optional)
\s*-\                          #
(?P<Type>(Highlight|Note|Bookmark))     # Clipping types accepted
(\ on\ Page\                   #
(?P<Page>[\d-]*)\ \|)?         # Page (optional)
(.*(Location|Loc\.)\           #
(?P<Location>[\d-]*))?         # Location (optional)
.*?Added\ on\                  #
(?P<Date>(.*))                 # Date & time
(\ GMT(\+|\-)[0-9:]*)          # Padding to exclude GMT and +/- number
\s*                            #
(?P<Text>.*?)                  # Text
\s*$                           #

Date Format:

%A, %d %B %y %H:%M:%S

Encoding:

UTF-8 (all languages)
Dec 17

Import Pattern for Kindle(R) Touch v5.1.2

This import pattern has been tested on a Kindle Touch running software version 5.1.2.  It may work on different hardware or older/newer software, but you are encouraged to find (or request) a pattern designed especially for your Kindle.

The pattern is designed to import Notes and Highlights, but not bookmarks.  It attempts to parse the book name into “Book (Author)”, the standard Kindle format.  This usually does not work on kindle documents and the complete book name (including author if listed) is stored in the book field.

Notes delimiter:

==========

Notes pattern:

# Import notes and highlights from "My Clippings.txt" and ignore
# bookmarks. Warnings with information on ignored bookmarks
# will be added to the log - this is the app normal behaviour

# Note that VERBOSE and UNICODE options are always on

^\s*                           #
(?P<Book>.*?)                  # Book name
(\s*\((?P<Author>[^\(]*)\))?   # Author name (optional)
\s*-\ Your\                    #
(?P<Type>(Highlight|Note))     # Clipping type - 'Highlight' or 'Note'
(\ on\ Page\                   #
(?P<Page>[\d-]*)\ \|)?         # Page (optional)
(.*(Location)\                 #
(?P<Location>[\d-]*))?         # Location (optional)
.*?Added\ on\                  #
(?P<Date>(.*)(AM|PM))          # Date & time
\s*                            #
(?P<Text>.*?)                  # Text
\s*$                           #

Date Format:

dddd, MMMM d, yyyy h:mm:ss A

Encoding:

UTF-8 (all languages)