Dec 19

Import Pattern for Kindle 4 showing GMT, with bookmarks

Some notes about this pattern before we start. If you just want to use the pattern (included in the installer from v0.6 on), skip down to the notes delimiter header.

A user had a My Clippings file that include date/times in the following format:

Added on Monday, 23 April 12 22:51:41 GMT+01:00

This was a significant difference from the US Kindle Touch and, unfortunately, was unsupported by the library used to parse dates. My solution was to exclude the “GMT+01:00” part of the match on the assumption that DaleyKlippings would correct for it by using the user’s local time. Even if this isn’t true, the difference should be trivial in the greater scheme of things.

A few other differences are present (especially if you’re using these samples to learn):

  • This pattern excludes the phrase “Your ” (including the trailing space) in front of the word Note/Highlight/Bookmark
  • This pattern includes bookmarks by adding the phrase inside the Type tag, e.g. (Highlight|Note|Bookmark)
  • Location was presented as Loc. so this was added to the matching text as a valid option
  • Note that “GMT” is outside the (?P<Date>…) tag.  This ensures that it is properly matched by the regular expression, but excludes it from the tag.
  • Because we had a two-digit year and there is a bug in our standard date/time object (QDateTime, which uses this formatting), this code uses a pattern compatible with Python’s datetime object (see Python’s datetime object for full documentation).  The code should automatically recognize and use the right parser based on your date/time pattern.

Notes delimiter:

==========

Notes pattern:

# Import notes and highlights from "My Clippings.txt" and ignore
# bookmarks. Warnings with information on ignored bookmarks
# will be added to the log - this is the app normal behaviour

# Note that VERBOSE and UNICODE options are always on

^\s*                           #
(?P<Book>.*?)                  # Book name
(\s*\((?P<Author>[^\(]*)\))?   # Author name (optional)
\s*-\                          #
(?P<Type>(Highlight|Note|Bookmark))     # Clipping types accepted
(\ on\ Page\                   #
(?P<Page>[\d-]*)\ \|)?         # Page (optional)
(.*(Location|Loc\.)\           #
(?P<Location>[\d-]*))?         # Location (optional)
.*?Added\ on\                  #
(?P<Date>(.*))                 # Date & time
(\ GMT(\+|\-)[0-9:]*)          # Padding to exclude GMT and +/- number
\s*                            #
(?P<Text>.*?)                  # Text
\s*$                           #

Date Format:

%A, %d %B %y %H:%M:%S

Encoding:

UTF-8 (all languages)
Dec 19

DaleyKlippings v0.6

    This version includes several improvements and bug fixes:

    • Added matching logic so location ranges like 245-6 will be processed correctly.
    • Upgraded built-in CSV templates from QuoteSafe to QuoteEscape
    • Added a Kindle 4 importer that supports timestamps ending in GMT+… or GMT-…
    • Added bookmark friendly importers
    • Altered time matching algorithm.  It now supports two modes, Python datetime (if the string includes the % character) or QDateTime (if the string does not).  This bypassed a bug in QDateTime as it was not matching two-digit years.
    Dec 17

    DaleyKlippings v0.4

      This version includes several additional features and bug fixes:

      • Fixed page and location matching logic to reduce the odds of an error when matching in files that include personal documents
      • Changed the SpanXmlSafe prefix to XmlSafeSpan
        • This ensures consistency with best practices under Export Patterns
        • Updated in code and built-in Evernotes pattern
      Dec 17

      Import Pattern for Kindle(R) Touch v5.1.2

      This import pattern has been tested on a Kindle Touch running software version 5.1.2.  It may work on different hardware or older/newer software, but you are encouraged to find (or request) a pattern designed especially for your Kindle.

      The pattern is designed to import Notes and Highlights, but not bookmarks.  It attempts to parse the book name into “Book (Author)”, the standard Kindle format.  This usually does not work on kindle documents and the complete book name (including author if listed) is stored in the book field.

      Notes delimiter:

      ==========

      Notes pattern:

      # Import notes and highlights from "My Clippings.txt" and ignore
      # bookmarks. Warnings with information on ignored bookmarks
      # will be added to the log - this is the app normal behaviour
      
      # Note that VERBOSE and UNICODE options are always on
      
      ^\s*                           #
      (?P<Book>.*?)                  # Book name
      (\s*\((?P<Author>[^\(]*)\))?   # Author name (optional)
      \s*-\ Your\                    #
      (?P<Type>(Highlight|Note))     # Clipping type - 'Highlight' or 'Note'
      (\ on\ Page\                   #
      (?P<Page>[\d-]*)\ \|)?         # Page (optional)
      (.*(Location)\                 #
      (?P<Location>[\d-]*))?         # Location (optional)
      .*?Added\ on\                  #
      (?P<Date>(.*)(AM|PM))          # Date & time
      \s*                            #
      (?P<Text>.*?)                  # Text
      \s*$                           #

      Date Format:

      dddd, MMMM d, yyyy h:mm:ss A

      Encoding:

      UTF-8 (all languages)
      Dec 17

      DaleyKlippings v0.3

        This version includes several additional features:

        • Added several export formats to the installer:
          • XML (with and without attached notes)
          • CSV (with and without attached notes)
        • Added several additional Prefixes to better support new export formats
          • QuoteSafe – Replaces ” with ‘
          • CommaSafe – Replaces , with _
          • TabSafe – Replaces <tab> with 5 <spaces>
          • Truncate### and Ellipsis###
            • If the input is longer that the three digit number, bot truncate the input (e.g. Truncate010 truncates to 10 digits while Truncate200 truncates to 200 digits).  This prefix must include 3 digits so numbers <100 should be zero padded (e.g. 010 for 10)
            • The only difference is that Ellipsis replaces the final 3 digits with “…”.
            • For example, given the phrase “star indica*tes 10th character boundary” Truncate010 will return “star indica” and Ellipsis010 will return “star ind…”
        Dec 16

        DaleyKlippings v0.2

          This version includes several minor but important bug fixes:

          • Fixed a bug where the program would fail if any of your notes or highlights were generated by a document without locations.
          • Attached Notes will now list the full location of the highlight.  They used to list only the location associated with the note.
          • Several minor core improvements
          Dec 15

          DaleyKlippings v0.1

          This is the first public beta of DaleyKlippings.  The file may be downloaded here:

            The software is derived from public domain Klippings (not to be confused with Klippings Kollector).  You can get a basic idea of the feature set at the original site.  However, I fixed a slew of bugs and have made many enhancements including:

            Advanced matching between notes and highlights:

            • The matching algorithm now uses locations and location ranges instead of dates when determining matches.  This should significantly improve the accuracy of the matching process.
            • When attaching a note, the algorithm automatically checks the highlight before and after the note.  This expands support to Kindle devices (like the Touch) that list the notes before the highlights.

            Now supports Author, Page, Note, and Highlight fields

            • All of these fields can be used as wildcards when creating export templates
            • Generally speaking, users should continue to use “Text” when creating export patterns because “Note” and “Highlight” will not always have data.

            Added “formatting prefixes” – Formatting prefixes make it easier to adjust data based on your output needs.  Current formats include:

            • XmlSafe – Replaces <, >, and & with their HTML equivalents
            • EvernoteTag – Enforces Evernote’s tag requirements by removing all commas (replaced with ‘_’) and limiting length to 100 characters
            • Formatting prefixes can be added to any tag to change the output.  For example, XmlSafeTitle or XmlSafeText.

            Fully customized Attached Notes

            • Notes no longer use the “highlight delimeter” logic.  Instead, each export pattern has an extra input box.  This box accepts a pattern that will be applied to Attached Notes.
            • This box uses the same pattern logic as other export windows.  It accepts all prefixes and any field EXCEPT “Text”.

            Enhanced Text field

            • Despite changes to Attached Notes and user interface, the Text field will continue to dynamically choose between “Bookmarks” (empty), “Notes”, “Highlights”, and “Attached Notes” when filling in the field.
            • The Text field accepts all Prefixes with one caveat.  Because Attached Notes can be fully configured through the Export pattern, prefixes are not applied to Attached Notes.  These prefixes are applied to individual Notes and Highlights.