[fluo-website] branch gh-pages updated: Move Fluo Recipes documentation to website (#92)

mwalch Mon, 02 Oct 2017 12:41:41 -0700

This is an automated email from the ASF dual-hosted git repository.

mwalch pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/fluo-website.git



The following commit(s) were added to refs/heads/gh-pages by this push:
     new 02ba26b  Move Fluo Recipes documentation to website (#92)
02ba26b is described below

commit 02ba26be221d3ff897f1d90468afe54bf70a5e28
Author: Mike Walch <[email protected]>
AuthorDate: Mon Oct 2 15:41:01 2017 -0400

    Move Fluo Recipes documentation to website (#92)
    
    * Created template and navbar
---
 _config.yml                              |  19 +-
 _layouts/recipes-1.2.html                |  55 ++++++
 _recipes-1-2/getting-started/overview.md |  69 +++++++
 _recipes-1-2/index.md                    |   4 +
 _recipes-1-2/recipes/accumulo-export.md  | 103 +++++++++++
 _recipes-1-2/recipes/combine-queue.md    | 210 +++++++++++++++++++++
 _recipes-1-2/recipes/export-queue.md     | 305 +++++++++++++++++++++++++++++++
 _recipes-1-2/recipes/recording-tx.md     |  76 ++++++++
 _recipes-1-2/recipes/row-hasher.md       | 121 ++++++++++++
 _recipes-1-2/tools/serialization.md      |  76 ++++++++
 _recipes-1-2/tools/spark.md              |  19 ++
 _recipes-1-2/tools/table-optimization.md |  66 +++++++
 _recipes-1-2/tools/testing.md            |  15 ++
 _recipes-1-2/tools/transient.md          |  85 +++++++++
 14 files changed, 1222 insertions(+), 1 deletion(-)

diff --git a/_config.yml b/_config.yml
index 657cb54..05ef091 100644
--- a/_config.yml
+++ b/_config.yml
@@ -10,6 +10,9 @@ collections:
   fluo-docs-1-2:
     output: true
     permalink: "/docs/fluo/1.2/:path"
+  recipes-1-2:
+    output: true
+    permalink: "/docs/fluo-recipes/1.2/:path"
 
 defaults:
   -
@@ -52,6 +55,20 @@ defaults:
       docs_base: "/docs/fluo/1.2"
       javadoc_base: 
"https://static.javadoc.io/org.apache.fluo/fluo-api/1.1.0-incubating";
       github_base: "https://github.com/apache/fluo/blob/master";
+  -
+    scope:
+      path: "_recipes-1-2"
+      type: "recipes-1-2"
+    values:
+      layout: "recipes-1.2"
+      title_prefix: "Fluo Recipes Documentation - "
+      version: "1.2.0"
+      minor_release: "1.2"
+      docs_base: "/docs/fluo-recipes/1.2"
+      github_base: "https://github.com/apache/fluo-recipes/blob/master";
+      javadoc_fluo: 
"https://static.javadoc.io/org.apache.fluo/fluo-api/1.1.0-incubating";
+      javadoc_core: 
"https://static.javadoc.io/org.apache.fluo/fluo-recipes-core/1.1.0-incubating";
+      javadoc_accumulo: 
"https://static.javadoc.io/org.apache.fluo/fluo-recipes-accumulo/1.1.0-incubating";
 
 # Number of posts displayed on the home page.
 num_home_posts: 5
@@ -65,7 +82,7 @@ latest_recipes_release: "1.1.0-incubating"
 
 # Sets links to external API
 api_base: "https://javadoc.io/doc/org.apache.fluo";
-api_static: "https://javadoc.io/page/org.apache.fluo";
+api_static: "https://static.javadoc.io/org.apache.fluo";
 fluo_api_base: "https://javadoc.io/doc/org.apache.fluo/fluo-api";
 fluo_api_static: "https://javadoc.io/page/org.apache.fluo/fluo-api";
 fluo_recipes_core_static: 
"https://javadoc.io/page/org.apache.fluo/fluo-recipes-core";
diff --git a/_layouts/recipes-1.2.html b/_layouts/recipes-1.2.html
new file mode 100644
index 0000000..ad51854
--- /dev/null
+++ b/_layouts/recipes-1.2.html
@@ -0,0 +1,55 @@
+---
+layout: default
+---
+
+<div class="row">
+  <div class="col-md-2">
+    <div class="panel-group" id="accordion" role="tablist" 
aria-multiselectable="true" data-spy="affix">
+      <div class="panel panel-default">
+      {% assign mydocs = site.recipes-1-2 | group_by: 'category' %}
+      {% assign categories = "getting-started,recipes,tools" | split: "," %}
+      {% for pcat in categories %}
+        {% for dcat in mydocs %}
+          {% if pcat == dcat.name %}
+            <div class="panel-heading" role="tab" id="headingOne">
+              <h4 class="panel-title">
+                <a role="button" data-toggle="collapse" 
data-parent="#accordion" href="#collapse{{ pcat }}" aria-expanded="{% if pcat 
== page.category %}true{% else %}false{% endif %}" aria-controls="collapse{{ 
pcat }}">
+                  {{ pcat | capitalize | replace: "-", " " }}
+                </a>
+              </h4>
+            </div>
+            <div id="collapse{{pcat}}" class="panel-collapse collapse{% if 
pcat == page.category %} in{% endif %}" role="tabpanel" 
aria-labelledby="headingOne">
+              <div class="panel-body">
+                {% assign items = dcat.items | sort: 'order' %}
+                {% for item in items %}
+                <div class="row doc-sidebar-link"><a href="{{ item.url }}">{{ 
item.title }}</a></div>
+                {% endfor %}
+              </div>
+            </div>
+          {% endif %}
+        {% endfor %}
+      {% endfor %}
+      </div>
+    </div>
+  </div>
+  <div class="col-md-10">
+    {% if page.category %}
+    <p>Fluo Recipes {{ page.version }} documentation &nbsp;&gt;&gt;&nbsp; {{ 
page.category | capitalize | replace: "-", " " }} &nbsp;&gt;&gt;&nbsp; {{ 
page.title }}</p>
+    {% endif %}
+
+    <div class="alert alert-danger" style="margin-bottom: 0px;" 
role="alert">This documentation is for a future release of Fluo! <a href="{{ 
site.baseurl }}/docs/fluo/{{ site.latest_fluo_release }}/">View documentation 
for the latest release</a>.</div>
+
+    {% unless page.nodoctitle %}
+    <div class="row">
+      <div class="col-md-10"><h1>{{ page.title }}</h1></div>
+      <div class="col-md-2"><a class="pull-right" style="margin-top: 25px;" 
href="https://github.com/apache/fluo-website/edit/master/{{ page.path }}" 
role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this 
page</small></a></div>
+    </div>  
+    {% endunless %}
+    {{ content }}
+
+    <div class="row" style="margin-top: 20px;">
+      <div class="col-md-10"><strong>Find documentation for all Fluo releases 
in the <a href="{{ site.baseurl }}/docs/">archive</strong></div>
+      <div class="col-md-2"><a class="pull-right" 
href="https://github.com/apache/fluo-website/edit/master/{{ page.path }}" 
role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this 
page</small></a></div>
+    </div>  
+  </div>
+</div>
diff --git a/_recipes-1-2/getting-started/overview.md 
b/_recipes-1-2/getting-started/overview.md
new file mode 100644
index 0000000..459ae2d
--- /dev/null
+++ b/_recipes-1-2/getting-started/overview.md
@@ -0,0 +1,69 @@
+---
+title: Overview
+category: getting-started
+order: 1
+---
+
+Fluo Recipes are common code for Apache Fluo application developers. They 
build on the
+[Fluo API][fluo-api] to offer additional functionality to
+developers. They are published separately from Fluo on their own release 
schedule.
+This allows Fluo Recipes to iterate and innovate faster than Fluo (which will 
maintain
+a more minimal API on a slower release cycle). Fluo Recipes offers code to 
implement
+common patterns on top of Fluo's API.  It also offers glue code to external 
libraries
+like Spark and Kryo.
+
+### Usage
+
+The Fluo Recipes project publishes multiple jars to Maven Central for each 
release.
+The `fluo-recipes-core` jar is the primary jar. It is where most recipes live 
and where
+they are placed by default if they have minimal dependencies beyond the Fluo 
API.
+
+Fluo Recipes with dependencies that bring in many transitive dependencies 
publish
+their own jar. For example, recipes that depend on Apache Spark are published 
in the
+`fluo-recipes-spark` jar.  If you don't plan on using code in the 
`fluo-recipes-spark`
+jar, you should avoid including it in your pom.xml to avoid a transitive 
dependency on
+Spark.
+
+Below is a sample Maven POM containing all possible Fluo Recipes dependencies:
+
+```xml
+<properties>
+  <fluo-recipes.version>{{ page.version }}</fluo-recipes.version>
+</properties>
+
+<dependencies>
+  <!-- Required. Contains recipes that are only depend on the Fluo API -->
+  <dependency>
+    <groupId>org.apache.fluo</groupId>
+    <artifactId>fluo-recipes-core</artifactId>
+    <version>${fluo-recipes.version}</version>
+  </dependency>
+  <!-- Optional. Serialization code that depends on Kryo -->
+  <dependency>
+    <groupId>org.apache.fluo</groupId>
+    <artifactId>fluo-recipes-kryo</artifactId>
+    <version>${fluo-recipes.version}</version>
+  </dependency>
+  <!-- Optional. Common code for using Fluo with Accumulo -->
+  <dependency>
+    <groupId>org.apache.fluo</groupId>
+    <artifactId>fluo-recipes-accumulo</artifactId>
+    <version>${fluo-recipes.version}</version>
+  </dependency>
+  <!-- Optional. Common code for using Fluo with Spark -->
+  <dependency>
+    <groupId>org.apache.fluo</groupId>
+    <artifactId>fluo-recipes-spark</artifactId>
+    <version>${fluo-recipes.version}</version>
+  </dependency>
+  <!-- Optional. Common code for writing Fluo integration tests -->
+  <dependency>
+    <groupId>org.apache.fluo</groupId>
+    <artifactId>fluo-recipes-test</artifactId>
+    <version>${fluo-recipes.version}</version>
+    <scope>test</scope>
+  </dependency>
+</dependencies>
+```
+
+[fluo-api]: https://fluo.apache.org/apidocs/fluo/
diff --git a/_recipes-1-2/index.md b/_recipes-1-2/index.md
new file mode 100644
index 0000000..f59c6ff
--- /dev/null
+++ b/_recipes-1-2/index.md
@@ -0,0 +1,4 @@
+---
+title: Apache Fluo documentation
+redirect_to: getting-started/overview
+---
diff --git a/_recipes-1-2/recipes/accumulo-export.md 
b/_recipes-1-2/recipes/accumulo-export.md
new file mode 100644
index 0000000..1daac62
--- /dev/null
+++ b/_recipes-1-2/recipes/accumulo-export.md
@@ -0,0 +1,103 @@
+---
+title: Accumulo Export
+category: recipes
+order: 3
+---
+
+## Background
+
+The [Export Queue Recipe][1] provides a generic foundation for building export 
mechanism to any
+external data store. The [AccumuloExporter] provides an [Exporter] for writing 
to
+Accumulo. [AccumuloExporter] is located in the `fluo-recipes-accumulo` module 
and provides the
+following functionality:
+
+ * Safely batches writes to Accumulo made by multiple transactions exporting 
data.
+ * Stores Accumulo connection information in Fluo configuration, making it 
accessible by Export
+   Observers running on other nodes.
+ * Provides utility code that make it easier and shorter to code common 
Accumulo export patterns.
+
+## Example Use
+
+Exporting to Accumulo is easy. Follow the steps below:
+
+1.  First, implement [AccumuloTranslator].  Your implementation translates 
exported
+    objects to Accumulo Mutations. For example, the `SimpleTranslator` class 
below translates String
+    key/values and into mutations for Accumulo.  This step is optional, a 
lambda could
+    be used in step 3 instead of creating a class.
+
+    ```java
+    public class SimpleTranslator implements AccumuloTranslator<String,String> 
{
+
+      @Override
+      public void translate(SequencedExport<String, String> export, 
Consumer<Mutation> consumer) {
+        Mutation m = new Mutation(export.getKey());
+        m.put("cf", "cq", export.getSequence(), export.getValue());
+        consumer.accept(m);
+      }
+    }
+    ```
+
+2.  Configure an `ExportQueue` and the export table prior to initializing Fluo.
+
+    ```java
+    FluoConfiguration fluoConfig = ...;
+
+    String instance =       // Name of accumulo instance exporting to
+    String zookeepers =     // Zookeepers used by Accumulo instance exporting 
to
+    String user =           // Accumulo username, user that can write to 
exportTable
+    String password =       // Accumulo user password
+    String exportTable =    // Name of table to export to
+
+    // Set properties for table to export to in Fluo app configuration.
+    AccumuloExporter.configure(EXPORT_QID).instance(instance, zookeepers)
+        .credentials(user, password).table(exportTable).save(fluoConfig);
+
+    // Set properties for export queue in Fluo app configuration
+    
ExportQueue.configure(EXPORT_QID).keyType(String.class).valueType(String.class)
+        .buckets(119).save(fluoConfig);
+
+    // Initialize Fluo using fluoConfig
+    ```
+
+3.  In the applications `ObserverProvider`, register an observer that will 
process exports and write
+    them to Accumulo using [AccumuloExporter].  Also, register observers that 
add to the export
+    queue.
+
+    ```java
+    public class MyObserverProvider implements ObserverProvider {
+
+      @Override
+      public void provide(Registry obsRegistry, Context ctx) {
+        SimpleConfiguration appCfg = ctx.getAppConfiguration();
+
+        ExportQueue<String, String> expQ = ExportQueue.getInstance(EXPORT_QID, 
appCfg);
+
+        // Register observer that will processes entries on export queue and 
write them to the Accumulo
+        // table configured earlier. SimpleTranslator from step 1 is passed 
here, could have used a
+        // lambda instead.
+        expQ.registerObserver(obsRegistry,
+            new AccumuloExporter<>(EXPORT_QID, appCfg, new 
SimpleTranslator()));
+
+        // An example observer created using a lambda that adds to the export 
queue.
+        obsRegistry.forColumn(OBS_COL, WEAK).useObserver((tx,row,col) -> {
+          // Read some data and do some work
+
+          // Add results to export queue
+          String key =    // key that identifies export
+          String value =  // object to export
+          expQ.add(tx, key, value);
+        });
+      }
+    }
+    ```
+
+## Other use cases
+
+The `getTranslator()` method in [AccumuloReplicator] creates a specialized 
[AccumuloTranslator] for replicating a Fluo table to Accumulo.
+
+[1]: {{ page.docs_base }}/recipes/export-queue/
+[Exporter]: {{ page.javadoc_core 
}}/org/apache/fluo/recipes/core/export/function/Exporter.html
+[AccumuloExporter]: {{ page.javadoc_accumulo 
}}/org/apache/fluo/recipes/accumulo/export/function/AccumuloExporter.html
+[AccumuloTranslator]: {{ page.javadoc_accumulo 
}}/org/apache/fluo/recipes/accumulo/export/function/AccumuloTranslator.html
+[AccumuloReplicator]: {{ page.javadoc_accumulo 
}}/org/apache/fluo/recipes/accumulo/export/AccumuloReplicator.html
+
diff --git a/_recipes-1-2/recipes/combine-queue.md 
b/_recipes-1-2/recipes/combine-queue.md
new file mode 100644
index 0000000..56cf51c
--- /dev/null
+++ b/_recipes-1-2/recipes/combine-queue.md
@@ -0,0 +1,210 @@
+---
+title: Combine Queue
+category: recipes
+order: 1
+---
+
+## Background
+
+When many transactions try to modify the same keys, collisions will occur.  
Too many collisions
+cause transactions to fail and throughput to nose dive.  For example, consider 
[phrasecount]
+which has many transactions processing documents.  Each transaction counts the 
phrases in a document
+and then updates global phrase counts.  Since transaction attempts to update 
many phrases
+, the probability of collisions is high.
+
+## Solution
+
+The [combine queue recipe][CombineQueue] provides a reusable solution for 
updating many keys while
+avoiding collisions.  The recipe also organizes updates into batches in order 
to improve throughput.
+
+This recipes queues updates to keys for other transactions to process. In the 
phrase count example
+transactions processing documents queue updates, but do not actually update 
the counts.  Below is an
+example of computing phrasecounts using this recipe.
+
+ * TX1 queues `+1` update  for phrase `we want lambdas now`
+ * TX2 queues `+1` update  for phrase `we want lambdas now`
+ * TX3 reads the updates and current value for the phrase `we want lambdas 
now`.  There is no current value and the updates sum to 2, so a new value of 2 
is written.
+ * TX4 queues `+2` update  for phrase `we want lambdas now`
+ * TX5 queues `-1` update  for phrase `we want lambdas now`
+ * TX6 reads the updates and current value for the phrase `we want lambdas 
now`.  The current value is 2 and the updates sum to 1, so a new value of 3 is 
written.
+
+Transactions processing updates have the ability to make additional updates.
+For example in addition to updating the current value for a phrase, the new
+value could also be placed on an export queue to update an external database.
+
+### Buckets
+
+A simple implementation of this recipe would have an update queue for each 
key.  However the
+implementation is slightly more complex.  Each update queue is in a bucket and 
transactions process
+all of the updates in a bucket.  This allows more efficient processing of 
updates for the following
+reasons :
+
+ * When updates are queued, notifications are made per bucket(instead of per a 
key).
+ * The transaction doing the update can scan the entire bucket reading 
updates, this avoids a seek for each key being updated.
+ * Also the transaction can request a batch lookup to get the current value of 
all the keys being updated.
+ * Any additional actions taken on update (like adding something to an export 
queue) can also be batched.
+ * Data is organized to make reading exiting values for keys in a bucket more 
efficient.
+
+Which bucket a key goes to is decided using hash and modulus so that multiple 
updates for a key go
+to the same bucket.
+
+The initial number of tablets to create when applying table optimizations can 
be controlled by
+setting the buckets per tablet option when configuring a Combine Queue.  For 
example if you
+have 20 tablet servers and 1000 buckets and want 2 tablets per tserver 
initially then set buckets
+per tablet to 1000/(2*20)=25.
+
+## Example Use
+
+The following code snippets show how to use this recipe for wordcount.  The 
first step is to
+configure it before initializing Fluo.  When initializing an ID is needed.  
This ID is used in two
+ways.  First, the ID is used as a row prefix in the table.  Therefore nothing 
else should use that
+row range in the table.  Second, the ID is used in generating configuration 
keys.
+
+The following snippet shows how to configure a combine queue.
+
+```java
+FluoConfiguration fluoConfig = ...;
+
+// Set application properties for the combine queue.  These properties are 
read later by
+// the observers running on each worker.
+CombineQueue.configure(WcObserverProvider.ID)
+    .keyType(String.class).valueType(Long.class).buckets(119).save(fluoConfig);
+
+fluoConfig.setObserverProvider(WcObserverProvider.class);
+
+// initialize Fluo using fluoConfig
+```
+
+Assume the following observer is triggered when a documents is updated.  It 
examines new
+and old document content and determines changes in word counts.  These changes 
are pushed to a
+combine queue.
+
+```java
+public class DocumentObserver implements StringObserver {
+  // word count combine queue
+  private CombineQueue<String, Long> wccq;
+
+  public static final Column NEW_COL = new Column("content", "new");
+  public static final Column CUR_COL = new Column("content", "current");
+
+  public DocumentObserver(CombineQueue<String, Long> wccq) {
+    this.wccq = wccq;
+  }
+
+  @Override
+  public void process(TransactionBase tx, String row, Column col) {
+
+    Preconditions.checkArgument(col.equals(NEW_COL));
+
+    String newContent = tx.gets(row, NEW_COL);
+    String currentContent = tx.gets(row, CUR_COL, "");
+
+    Map<String, Long> newWordCounts = getWordCounts(newContent);
+    Map<String, Long> currentWordCounts = getWordCounts(currentContent);
+
+    // determine changes in word counts between old and new document content
+    Map<String, Long> changes = calculateChanges(newWordCounts, 
currentWordCounts);
+
+    // queue updates to word counts for processing by other transactions
+    wccq.add(tx, changes);
+
+    // update the current content and delete the new content
+    tx.set(row, CUR_COL, newContent);
+    tx.delete(row, NEW_COL);
+  }
+
+  private static Map<String, Long> getWordCounts(String doc) {
+    // TODO extract words from doc
+  }
+
+  private static Map<String, Long> calculateChanges(Map<String, Long> 
newCounts,
+      Map<String, Long> currCounts) {
+    Map<String, Long> changes = new HashMap<>();
+
+    // guava Maps class
+    MapDifference<String, Long> diffs = Maps.difference(currCounts, newCounts);
+
+    // compute the diffs for words that changed
+    changes.putAll(Maps.transformValues(diffs.entriesDiffering(),
+        vDiff -> vDiff.rightValue() - vDiff.leftValue()));
+
+    // add all new words
+    changes.putAll(diffs.entriesOnlyOnRight());
+
+    // subtract all words no longer present
+    changes.putAll(Maps.transformValues(diffs.entriesOnlyOnLeft(), l -> l * 
-1));
+
+    return changes;
+  }
+}
+```
+
+Each combine queue has two extension points, a [combiner][Combiner] and a 
[change
+observer][ChangeObserver].  The combine queue configures a Fluo observer to 
process queued
+updates.  When processing updates the two extension points are called.  The 
code below shows
+how to use these extension points.
+
+A change observer can do additional processing when a batch of key values are 
updated.  Below
+updates are queued for export to an external database.  The export is given 
the new and old value
+allowing it to delete the old value if needed.
+
+```java
+public class WcObserverProvider implements ObserverProvider {
+
+  public static final String ID = "wc";
+
+  @Override
+  public void provide(Registry obsRegistry, Context ctx) {
+
+    ExportQueue<String, MyDatabaseExport> exportQ = createExportQueue(ctx);
+
+    // Create a combine queue for computing word counts.
+    CombineQueue<String, Long> wcMap = CombineQueue.getInstance(ID, 
ctx.getAppConfiguration());
+
+    // Register observer that updates the Combine Queue
+    obsRegistry.forColumn(DocumentObserver.NEW_COL, STRONG).useObserver(new 
DocumentObserver(wcMap));
+
+    // Used to join new and existing values for a key. The lambda sums all 
values and returns
+    // Optional.empty() when the sum is zero. Returning Optional.empty() 
causes the key/value to be
+    // deleted. Could have used the built in SummingCombiner.
+    Combiner<String, Long> combiner = input -> 
input.stream().reduce(Long::sum).filter(l -> l != 0);
+
+    // Called when the value of a key changes. The lambda exports these 
changes to an external
+    // database. Make sure to read ChangeObserver's javadoc.
+    ChangeObserver<String, Long> changeObs = (tx, changes) -> {
+      for (Change<String, Long> update : changes) {
+        String word = update.getKey();
+        Optional<Long> oldVal = update.getOldValue();
+        Optional<Long> newVal = update.getNewValue();
+
+        // Queue an export to let an external database know the word count has 
changed.
+        exportQ.add(tx, word, new MyDatabaseExport(oldVal, newVal));
+      }
+    };
+
+    // Register observer that handles updates to the CombineQueue. This 
observer will use the
+    // combiner and valueObserver.
+    wcMap.registerObserver(obsRegistry, combiner, changeObs);
+  }
+}
+```
+
+## Guarantees
+
+This recipe makes two important guarantees about updates for a key when it
+calls `process()` on a [ChangeObserver].
+
+ * The new value reported for an update will be derived from combining all
+   updates that were committed before the transaction thats processing updates
+   started.  The implementation may have to make multiple passes over queued
+   updates to achieve this.  In the situation where TX1 queues a `+1` and later
+   TX2 queues a `-1` for the same key, there is no need to worry about only 
seeing
+   the `-1` processed.  A transaction that started processing updates after TX2
+   committed would process both.
+ * The old value will always be what was reported as the new value in the
+   previous transaction that called `ChangeObserver.process()`.
+
+[phrasecount]: https://github.com/astralway/phrasecount
+[CombineQueue]: {{ page.javadoc_core 
}}/org/apache/fluo/recipes/core/combine/CombineQueue.html
+[ChangeObserver]: {{ page.javadoc_core 
}}/org/apache/fluo/recipes/core/combine/ChangeObserver.html
+[Combiner]: {{ page.javadoc_core 
}}/org/apache/fluo/recipes/core/combine/Combiner.html
diff --git a/_recipes-1-2/recipes/export-queue.md 
b/_recipes-1-2/recipes/export-queue.md
new file mode 100644
index 0000000..7b42a9b
--- /dev/null
+++ b/_recipes-1-2/recipes/export-queue.md
@@ -0,0 +1,305 @@
+---
+title: Export Queue
+category: recipes
+order: 2
+---
+
+## Background
+
+Fluo is not suited for servicing low latency queries for two reasons. First, 
the implementation of
+transactions are designed for throughput.  To get throughput, transactions 
recover lazily from
+failures and may wait on other transactions.  Both of these design decisions 
can
+lead to delays of individual transactions, but do not negatively impact 
throughput.   The second
+reason is that Fluo observers executing transactions will likely cause a large 
number of random
+accesses.  This could lead to high response time variability for an individual 
random access.  This
+variability would not impede throughput but would impede the goal of low 
latency.
+
+One way to make data transformed by Fluo available for low latency queries is
+to export that data to another system.  For example Fluo could run on
+cluster A, continually transforming a large data set, and exporting data to
+Accumulo tables on cluster B.  The tables on cluster B would service user
+queries.  Fluo Recipes has built in support for [exporting to Accumulo][aeq],
+however this recipe can be used to export to systems other than Accumulo, like
+Redis, Elasticsearch, MySQL, etc.
+
+Exporting data from Fluo is easy to get wrong which is why this recipe exists.
+To understand what can go wrong consider the following example observer
+transaction.
+
+```java
+public class MyObserver implements StringObserver {
+
+    static final Column UPDATE_COL = new Column("meta", "numUpdates");
+    static final Column COUNTER_COL = new Column("meta", "counter1");
+
+    //reperesents a Query system extrnal to Fluo that is updated by Fluo
+    QuerySystem querySystem;
+
+    @Override
+    public void process(TransactionBase tx, String row, Column col) {
+
+       int oldCount = Integer.parseInt(tx.gets(row, COUNTER_COL, "0"));
+       int numUpdates = Integer.parseInt(tx.gets(row, UPDATE_COL, "0"));
+       int newCount = oldCount + numUpdates;
+
+       tx.set(row, COUNTER_COL, "" + newCount);
+       tx.delete(row, UPDATE_COL);
+
+        //Build an inverted index in the query system, based on count from the
+        //meta:counter1 column in fluo.  Do this by creating rows for the
+        //external query system based on the count.
+        String oldCountRow = String.format("%06d", oldCount);
+        String newCountRow = String.format("%06d", newCount);
+
+        //add a new entry to the inverted index
+        querySystem.insertRow(newCountRow, row);
+        //remove the old entry from the inverted index
+        querySystem.deleteRow(oldCountRow, row);
+    }
+}
+```
+
+The above example would keep the external index up to date beautifully as long
+as the following conditions are met.
+
+  * Threads executing transactions always complete successfully.
+  * Only a single thread ever responds to a notification.
+
+However these conditions are not guaranteed by Fluo.  Multiple threads may
+attempt to process a notification concurrently (only one may succeed).  Also at
+any point in time a transaction may fail (for example the computer executing it
+may reboot).   Both of these problems will occur and will lead to corruption of
+the external index in the example.  The inverted index and Fluo  will become
+inconsistent.  The inverted index will end up with multiple entries (that are
+never cleaned up) for single entity even though the intent is to only have one.
+
+The root of the problem in the example above is that its exporting uncommitted
+data.  There is no guarantee that setting the column `<row>:meta:counter1` to
+`newCount` will succeed until the transaction is successfully committed.
+However, `newCountRow` is derived from `newCount` and written to the external 
query
+system before the transaction is committed (Note : for observers, the
+transaction is committed by the framework after `process(...)` is called).  So
+if the transaction fails, the next time it runs it could compute a completely
+different value for `newCountRow` (and it would not delete what was written by 
the
+failed transaction).
+
+## Solution
+
+The simple solution to the problem of exporting uncommitted data is to only
+export committed data.  There are multiple ways to accomplish this.  This
+recipe offers a reusable implementation of one method.  This recipe has the
+following elements:
+
+ * An export queue that transactions can add key/values to.  Only if the 
transaction commits successfully will the key/value end up in the queue.  A 
Fluo application can have multiple export queues, each one must have a unique 
id.
+ * When a key/value is added to the export queue, its given a sequence number. 
 This sequence number is based on the transactions start timestamp.
+ * Each export queue is configured with an observer that processes key/values 
that were successfully committed to the queue.
+ * When key/values in an export queue are processed, they are deleted so the 
export queue does not keep any long term data.
+ * Key/values in an export queue are placed in buckets.  This is done so that 
all of the updates in a bucket can be processed in a single transaction.  This 
allows an efficient implementation of this recipe in Fluo.  It can also lead to 
efficiency in a system being exported to, if the system can benefit from 
batching updates.  The number of buckets in an export queue is configurable.
+
+There are three requirements for using this recipe :
+
+ * Must configure export queues before initializing a Fluo application.
+ * Transactions adding to an export queue must get an instance of the queue 
using its unique QID.
+ * Must create a class or lambda that implements [Exporter] in order to 
process exports.
+
+## Example Use
+
+This example shows how to incrementally build an inverted index in an external 
query system using an
+export queue.  The class below is simple POJO used as the value for the export 
queue.
+
+```java
+class CountUpdate {
+  public int oldCount;
+  public int newCount;
+
+  public CountUpdate(int oc, int nc) {
+    this.oldCount = oc;
+    this.newCount = nc;
+  }
+}
+```
+
+The following code shows how to configure an export queue.  This code
+modifies the FluoConfiguration object with options needed for the export queue.
+This FluoConfiguration object should be used to initialize the fluo
+application.
+
+```java
+public class FluoApp {
+
+  // export queue id "ici" means inverted count index
+  public static final String EQ_ID = "ici";
+
+  static final Column UPDATE_COL = new Column("meta", "numUpdates");
+  static final Column COUNTER_COL = new Column("meta", "counter1");
+
+  public static class AppObserverProvider implements ObserverProvider {
+    @Override
+    public void provide(Registry obsRegistry, Context ctx) {
+      ExportQueue<String, CountUpdate> expQ =
+          ExportQueue.getInstance(EQ_ID, ctx.getAppConfiguration());
+
+      // register observer that will queue data to export
+      obsRegistry.forColumn(UPDATE_COL, STRONG).useObserver(new 
MyObserver(expQ));
+
+      // register observer that will export queued data
+      expQ.registerObserver(obsRegistry, new CountExporter());
+    }
+  }
+
+  /**
+   * Call this method before initializing Fluo.
+   *
+   * @param fluoConfig the configuration object that will be used to 
initialize Fluo
+   */
+  public static void preInit(FluoConfiguration fluoConfig) {
+
+    // Set properties for export queue in Fluo app configuration
+    ExportQueue.configure(QUEUE_ID)
+        .keyType(String.class)
+        .valueType(CountUpdate.class)
+        .buckets(1009)
+        .bucketsPerTablet(10)
+        .save(getFluoConfiguration());
+
+    fluoConfig.setObserverProvider(AppObserverProvider.class);
+  }
+}
+```
+
+Below is updated version of the observer from above thats now using an export
+queue.
+
+```java
+public class MyObserver implements StringObserver {
+
+  private ExportQueue<String, CountUpdate> exportQueue;
+
+  public MyObserver(ExportQueue<String, CountUpdate> exportQueue) {
+    this.exportQueue = exportQueue;
+  }
+
+  @Override
+  public void process(TransactionBase tx, String row, Column col) {
+
+    int oldCount = Integer.parseInt(tx.gets(row, FluoApp.COUNTER_COL, "0"));
+    int numUpdates = Integer.parseInt(tx.gets(row, FluoApp.UPDATE_COL, "0"));
+    int newCount = oldCount + numUpdates;
+
+    tx.set(row, FluoApp.COUNTER_COL, "" + newCount);
+    tx.delete(row, FluoApp.UPDATE_COL);
+
+    // Because the update to the export queue is part of the transaction,
+    // either the update to meta:counter1 is made and an entry is added to
+    // the export queue or neither happens.
+    exportQueue.add(tx, row, new CountUpdate(oldCount, newCount));
+  }
+}
+```
+
+The export queue will call the `accept()` method on the class below to process 
entries queued for
+export.  It is possible the call to `accept()` can fail part way through 
and/or be called multiple
+times.  In the case of failures the export consumer will be called again with 
the same data.
+Its possible for the same export entry to be processed on multiple computers 
at different times.
+This can cause exports to arrive out of order.   The purpose of the sequence 
number is to help
+systems receiving out of order and redundant data.
+
+```java
+public class CountExporter implements Exporter<String, CountUpdate> {
+  // represents the external query system we want to update from Fluo
+  QuerySystem querySystem;
+
+  @Override
+  public void export(Iterator<SequencedExport<String, CountUpdate>> exports) {
+    BatchUpdater batchUpdater = querySystem.getBatchUpdater();
+
+    while (exports.hasNext()) {
+      SequencedExport<String, CountUpdate> export = exports.next();
+      String row = export.getKey();
+      CountUpdate uc = export.getValue();
+      long seqNum = export.getSequence();
+
+      String oldCountRow = String.format("%06d", uc.oldCount);
+      String newCountRow = String.format("%06d", uc.newCount);
+
+      // add a new entry to the inverted index
+      batchUpdater.insertRow(newCountRow, row, seqNum);
+      // remove the old entry from the inverted index
+      batchUpdater.deleteRow(oldCountRow, row, seqNum);
+    }
+
+    // flush all of the updates to the external query system
+    batchUpdater.close();
+  }
+}
+```
+
+## Schema
+
+Each export queue stores its data in the Fluo table in a contiguous row range.
+This row range is defined by using the export queue id as a row prefix for all
+data in the export queue.  So the row range defined by the export queue id
+should not be used by anything else.
+
+All data stored in an export queue is [transient]. When an export
+queue is configured, it will recommend split points using the [table
+optimization process][table-opt].  The number of splits generated
+by this process can be controlled by setting the number of buckets per tablet
+when configuring an export queue.
+
+## Concurrency
+
+Additions to the export queue will never collide.  If two transactions add the
+same key at around the same time and successfully commit, then two entries with
+different sequence numbers will always be added to the queue.  The sequence
+number is based on the start timestamp of the transactions.
+
+If the key used to add items to the export queue is deterministically derived
+from something the transaction is writing to, then that will cause a collision.
+For example consider the following interleaving of two transactions adding to
+the same export queue in a manner that will collide. Note, TH1 is shorthand for
+thread 1, ek() is a function the creates the export key, and ev() is a function
+that creates the export value.
+
+ 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
+ 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
+ 1. TH1 : exportQueueA.add(tx1, key1, val1)
+ 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
+ 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
+ 1. TH2 : exportQueueA.add(tx2, key2, val2)
+ 1. TH1 : tx1.set(`row1`,`fam1:qual1`, val1)
+ 1. TH2 : tx2.set(`row1`,`fam1:qual1`, val2)
+
+In the example above only one transaction will succeed because both are setting
+`row1 fam1:qual1`.  Since adding to the export queue is part of the
+transaction, only the transaction that succeeds will add something to the
+queue.  If the funtion ek() in the example is deterministic, then both
+transactions would have been trying to add the same key to the export queue.
+
+With the above method, we know that transactions adding entries to the queue 
for
+the same key must have executed [serially][serial]. Knowing that transactions 
which
+added the same key did not overlap in time makes reasoning about those export
+entries very simple.
+
+The example below is a slight modification of the example above.  In this
+example both transactions will successfully add entries to the queue using the
+same key.  Both transactions succeed because they are writing to different
+cells (`rowB fam1:qual2` and `rowA fam1:qual2`).  This approach makes it more
+difficult to reason about export entries with the same key, because the
+transactions adding those entries could have overlapped in time.  This is an
+example of write skew mentioned in the Percolater paper.
+
+ 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
+ 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
+ 1. TH1 : exportQueueA.add(tx1, key1, val1)
+ 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
+ 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
+ 1. TH2 : exportQueueA.add(tx2, key2, val2)
+ 1. TH1 : tx1.set(`rowA`,`fam1:qual2`, val1)
+ 1. TH2 : tx2.set(`rowB`,`fam1:qual2`, val2)
+
+[Exporter]: {{ page.javadoc_core 
}}/org/apache/fluo/recipes/core/export/function/Exporter.html
+[serial]: https://en.wikipedia.org/wiki/Serializability
+[aeq]: {{ page.docs_base }}/recipes/accumulo-export-queue/
+[transient]: {{ page.docs_base }}/tools/transient/
+[table-opt]: {{ page.docs_base }}/tools/table-optimization/
diff --git a/_recipes-1-2/recipes/recording-tx.md 
b/_recipes-1-2/recipes/recording-tx.md
new file mode 100644
index 0000000..703fdda
--- /dev/null
+++ b/_recipes-1-2/recipes/recording-tx.md
@@ -0,0 +1,76 @@
+---
+title: Recording Transaction
+category: recipes
+order: 5
+---
+
+A [RecordingTransaction] is an implementation of [Transaction] that logs all 
transaction operations
+(i.e GET, SET, or DELETE) to a `TxLog` object for later uses such as exporting 
data.  The code below
+shows how a RecordingTransaction is created by wrapping a Transaction object:
+
+```java
+RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
+```
+
+A predicate function can be passed to wrap method to select which log entries 
to record.  The code
+below only records log entries whose column family is `meta`:
+
+```java
+RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx,
+                               le -> 
le.getColumn().getFamily().toString().equals("meta"));
+```
+
+After creating a [RecordingTransaction], users can use it as they would use a 
Transaction object.
+
+```java
+Bytes value = rtx.get(Bytes.of("r1"), new Column("cf1", "cq1"));
+```
+
+While SET or DELETE operations are always recorded to the log, GET operations 
are only recorded if a
+value was found at the requested row/column.  Also, if a GET method returns an 
iterator, only the GET
+operations that are retrieved from the iterator are logged.  GET operations 
are logged as they are
+necessary if you want to determine the changes made by the transaction.
+ 
+When you are done operating on the transaction, you can retrieve the TxLog 
using the following code:
+
+```java
+TxLog myTxLog = rtx.getTxLog()
+```
+
+Below is example code of how a [RecordingTransaction] can be used in an 
observer to record all operations
+performed by the transaction in a TxLog.  In this example, a GET (if data 
exists) and SET operation
+will be logged.  This TxLog can be added to an export queue and later used to 
export updates from 
+Fluo.
+
+```java
+public class MyObserver extends AbstractObserver {
+
+    private static final TYPEL = new TypeLayer(new StringEncoder());
+    
+    private ExportQueue<Bytes, TxLog> exportQueue;
+
+    @Override
+    public void process(TransactionBase tx, Bytes row, Column col) {
+
+        // create recording transaction (rtx)
+        RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
+        
+        // use rtx to create a typed transaction & perform operations
+        TypedTransactionBase ttx = TYPEL.wrap(rtx);
+        int count = 
ttx.get().row(row).fam("meta").qual("counter1").toInteger(0);
+        ttx.mutate().row(row).fam("meta").qual("counter1").set(count+1);
+        
+        // when finished performing operations, retrieve transaction log
+        TxLog txLog = rtx.getTxLog()
+
+        // add txLog to exportQueue if not empty
+        if (!txLog.isEmpty()) {
+          //do not pass rtx to exportQueue.add()
+          exportQueue.add(tx, row, txLog)
+        }
+    }
+}
+```
+
+[RecordingTransaction]: {{ page.javadoc_core 
}}/org/apache/fluo/recipes/core/transaction/RecordingTransaction.html
+[Transaction]: {{ page.javadoc_fluo 
}}/org/apache/fluo/api/client/Transaction.html
diff --git a/_recipes-1-2/recipes/row-hasher.md 
b/_recipes-1-2/recipes/row-hasher.md
new file mode 100644
index 0000000..4b0f712
--- /dev/null
+++ b/_recipes-1-2/recipes/row-hasher.md
@@ -0,0 +1,121 @@
+---
+title: Row Hash Prefix
+category: recipes
+order: 4
+---
+
+## Background
+
+Transactions are implemented in Fluo using conditional mutations.  Conditional
+mutations require server side processing on tservers.  If data is not spread
+evenly, it can cause some tservers to execute more conditional mutations than
+others.  These tservers doing more work can become a bottleneck.  Most real
+world data is not uniform and can cause this problem.
+
+Before the Fluo [Webindex example][1] started using this recipe it suffered
+from this problem.  The example was using reverse dns encoded URLs for row keys
+like `p:com.cnn/story1.html`.  This made certain portions of the table more
+popular, which in turn made some tservers do much more work.  This uneven
+distribution of work lead to lower throughput and uneven performance.  Using
+this recipe made those problems go away.
+
+## Solution
+
+This recipe provides code to help add a hash of the row as a prefix of the row.
+Using this recipe rows are structured like the following.
+
+```
+<prefix>:<fixed len row hash>:<user row>
+```
+
+The recipe also provides code to help generate split points and configure
+balancing of the prefix.
+
+## Example Use
+
+```java
+import org.apache.fluo.api.config.FluoConfiguration;
+import org.apache.fluo.api.data.Bytes;
+import org.apache.fluo.recipes.core.data.RowHasher;
+
+public class RowHasherExample {
+
+  private static final RowHasher PAGE_ROW_HASHER = new RowHasher("p");
+
+  // Provide one place to obtain row hasher.
+  public static RowHasher getPageRowHasher() {
+    return PAGE_ROW_HASHER;
+  }
+
+  public static void main(String[] args) {
+    RowHasher pageRowHasher = getPageRowHasher();
+
+    String revUrl = "org.wikipedia/accumulo";
+
+    // Add a hash prefix to the row. Use this hashedRow in your transaction
+    Bytes hashedRow = pageRowHasher.addHash(revUrl);
+    System.out.println("hashedRow      : " + hashedRow);
+
+    // Remove the prefix. This can be used by transactions dealing with the 
hashed row.
+    Bytes orig = pageRowHasher.removeHash(hashedRow);
+    System.out.println("orig           : " + orig);
+
+    // Generate table optimizations for the recipe. This can be called when 
setting up an
+    // application that uses a hashed row.
+    int numTablets = 20;
+
+    // The following code would normally be called before initializing Fluo. 
This code
+    // registers table optimizations for your prefix+hash.
+    FluoConfiguration conf = new FluoConfiguration();
+    RowHasher.configure(conf, PAGE_ROW_HASHER.getPrefix(), numTablets);
+
+    // Normally you would not call the following code, it would be called 
automatically for you by
+    // TableOperations.optimizeTable(). Calling this code here to show what 
table optimization will
+    // be generated.
+    TableOptimizations tableOptimizations = new RowHasher.Optimizer()
+        .getTableOptimizations(PAGE_ROW_HASHER.getPrefix(), 
conf.getAppConfiguration());
+    System.out.println("Balance config : " + 
tableOptimizations.getTabletGroupingRegex());
+    System.out.println("Splits         : ");
+    tableOptimizations.getSplits().forEach(System.out::println);
+    System.out.println();
+  }
+}
+```
+
+The example program above prints the following.
+
+```
+hashedRow      : p:1yl0:org.wikipedia/accumulo
+orig           : org.wikipedia/accumulo
+Balance config : (\Qp:\E).*
+Splits         : 
+p:1sst
+p:3llm
+p:5eef
+p:7778
+p:9001
+p:assu
+p:clln
+p:eeeg
+p:g779
+p:i002
+p:jssv
+p:lllo
+p:neeh
+p:p77a
+p:r003
+p:sssw
+p:ullp
+p:weei
+p:y77b
+p:~
+```
+
+The split points are used to create tablets in the Accumulo table used by Fluo.
+Data and computation will spread very evenly across these tablets.  The
+Balancing config will spread the tablets evenly across the tablet servers,
+which will spread the computation evenly. See the [table optimizations][2]
+documentation for information on how to apply the optimizations.
+ 
+[1]: https://github.com/astralway/webindex
+[2]: {{ page.docs_base }}/tools/table-optimization/
diff --git a/_recipes-1-2/tools/serialization.md 
b/_recipes-1-2/tools/serialization.md
new file mode 100644
index 0000000..df4b73e
--- /dev/null
+++ b/_recipes-1-2/tools/serialization.md
@@ -0,0 +1,76 @@
+---
+title: Serializing Data
+category: tools
+order: 1
+---
+
+Various Fluo Recipes deal with POJOs and need to serialize them.  The
+serialization mechanism is configurable and defaults to using [Kryo].
+
+## Custom Serialization
+
+In order to use a custom serialization method, two steps need to be taken.  The
+first step is to implement [SimpleSerializer].  The second step is to
+configure Fluo Recipes to use the custom implementation.  This needs to be done
+before initializing Fluo.  Below is an example of how to do this.
+
+```java
+FluoConfiguration fluoConfig = ...;
+//assume MySerializer implements SimpleSerializer
+SimpleSerializer.setSetserlializer(fluoConfig, MySerializer.class);
+//initialize Fluo using fluoConfig
+```
+
+## Kryo Factory
+
+If using the default Kryo serializer implementation, then creating a
+KryoFactory implementation can lead to smaller serialization size.  When Kryo
+serializes an object graph, it will by default include the fully qualified
+names of the classes in the serialized data.  This can be avoided by
+[registering classes][register] that will be serialized.  Registration is done 
by
+creating a KryoFactory and then configuring Fluo Recipes to use it.   The
+example below shows how to do this.
+
+For example assume the POJOs named `Node` and `Edge` will be serialized and
+need to be registered with Kryo.  This could be done by creating a KryoFactory
+like the following.
+
+```java
+package com.foo;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.pool.KryoFactory;
+
+import com.foo.data.Edge;
+import com.foo.data.Node;
+
+public class MyKryoFactory implements KryoFactory {
+  @Override
+  public Kryo create() {
+    Kryo kryo = new Kryo();
+    
+    //Explicitly assign each class a unique id here to ensure its stable over
+    //time and in different environments with different dependencies.
+    kryo.register(Node.class, 9);
+    kryo.register(Edge.class, 10);
+    
+    //instruct kryo that these are the only classes we expect to be serialized
+    kryo.setRegistrationRequired(true);
+    
+    return kryo;
+  }
+}
+```
+
+Fluo Recipes must be configured to use this factory.  The following code shows
+how to do this.
+
+```java
+FluoConfiguration fluoConfig = ...;
+KryoSimplerSerializer.setKryoFactory(fluoConfig, MyKryoFactory.class);
+//initialize Fluo using fluoConfig
+```
+
+[Kryo]: https://github.com/EsotericSoftware/kryo
+[SimpleSerializer]: {{ page.javadoc_core 
}}/org/apache/fluo/recipes/core/serialization/SimpleSerializer.html
+[register]: https://github.com/EsotericSoftware/kryo#registration
diff --git a/_recipes-1-2/tools/spark.md b/_recipes-1-2/tools/spark.md
new file mode 100644
index 0000000..561879d
--- /dev/null
+++ b/_recipes-1-2/tools/spark.md
@@ -0,0 +1,19 @@
+---
+title: Spark Helper
+category: tools
+order: 4
+---
+
+Fluo Recipes has some helper code for [Apache Spark][spark].  Most of the 
helper code is currently
+related to bulk importing data into Accumulo.  This is useful for initializing 
a new Fluo table with
+historical data via Spark.  The Spark helper code is found in the 
[fluo-recipes-spark module][frs].
+
+For information on using Spark to load data into Fluo, check out this [blog 
post][blog].
+
+If you know of other Spark+Fluo integration code that would be useful, then 
please consider [opening
+an issue](https://github.com/apache/fluo-recipes/issues/new).
+
+[spark]: https://spark.apache.org
+[frs]: {{ site.api_base }}/fluo-recipes-spark/{{ page.version }}
+[blog]: https://fluo.apache.org/blog/2016/12/22/spark-load/
+
diff --git a/_recipes-1-2/tools/table-optimization.md 
b/_recipes-1-2/tools/table-optimization.md
new file mode 100644
index 0000000..e2202bd
--- /dev/null
+++ b/_recipes-1-2/tools/table-optimization.md
@@ -0,0 +1,66 @@
+---
+title: Table Optimization
+category: tools
+order: 3
+---
+
+## Background
+
+Recipes may need to make Accumulo specific table modifications for optimal
+performance.  Configuring the [Accumulo tablet balancer][3] and adding splits 
are
+two optimizations that are currently done.  Offering a standard way to do these
+optimizations makes it easier to use recipes correctly.  These optimizations
+are optional.  You could skip them for integration testing, but would probably
+want to use them in production.
+
+## Java Example
+
+```java
+FluoConfiguration fluoConf = ...
+
+//export queue configure method will return table optimizations it would like 
made
+ExportQueue.configure(fluoConf, ...);
+
+//CollisionFreeMap.configure() will return table optimizations it would like 
made
+CollisionFreeMap.configure(fluoConf, ...);
+
+//configure optimizations for a prefixed hash range of a table
+RowHasher.configure(fluoConf, ...);
+
+//initialize Fluo
+FluoFactory.newAdmin(fluoConf).initialize(...)
+
+//Automatically optimize the Fluo table for all configured recipes
+TableOperations.optimizeTable(fluoConf);
+```
+
+[TableOperations][2] is provided in the Accumulo module of Fluo Recipes.
+
+## Command Example
+
+Fluo Recipes provides an easy way to optimize a Fluo table for configured
+recipes from the command line.  This should be done after configuring reciped
+and initializing Fluo.  Below are example command for initializing in this way.
+
+```bash
+
+#create application 
+fluo new app1
+
+#configure application
+
+#initialize Fluo
+fluo init app1
+
+#optimize table for all configured recipes
+fluo exec app1 org.apache.fluo.recipes.accumulo.cmds.OptimizeTable
+```
+
+## Table optimization registry
+
+Recipes register themself by calling 
[TableOptimizations.registerOptimization()][1].  Anyone can use
+this mechanism, its not limited to use by exisitng recipes.
+
+[1]: {{ page.javadoc_core 
}}/org/apache/fluo/recipes/core/common/TableOptimizations.html
+[2]: {{ page.javadoc_accumulo 
}}/org/apache/fluo/recipes/accumulo/ops/TableOperations.html
+[3]: 
http://accumulo.apache.org/blog/2015/03/20/balancing-groups-of-tablets.html
diff --git a/_recipes-1-2/tools/testing.md b/_recipes-1-2/tools/testing.md
new file mode 100644
index 0000000..cb0eb09
--- /dev/null
+++ b/_recipes-1-2/tools/testing.md
@@ -0,0 +1,15 @@
+---
+title: Testing
+category: tools
+order: 5
+---
+
+Fluo includes MiniFluo which makes it possible to write an integeration test 
that
+runs against a real Fluo instance.  Fluo Recipes provides the following utility
+code for writing an integration test.
+
+ * [FluoITHelper][1] A class with utility methods for comparing expected data 
with whats in Fluo.
+ * [AccumuloExportITBase][2] A base class for writing an integration test that 
exports data from Fluo to an external Accumulo table.
+
+[1]: {{ site.api_static }}/fluo-recipes-test/{{ page.version 
}}/org/apache/fluo/recipes/test/FluoITHelper.html
+[2]: {{ site.api_static }}/fluo-recipes-test/{{ page.version 
}}/org/apache/fluo/recipes/test/AccumuloExportITBase.html
diff --git a/_recipes-1-2/tools/transient.md b/_recipes-1-2/tools/transient.md
new file mode 100644
index 0000000..2e3b5ec
--- /dev/null
+++ b/_recipes-1-2/tools/transient.md
@@ -0,0 +1,85 @@
+---
+title: Transient Data
+category: tools
+order: 2
+---
+
+## Background
+
+Some recipes store transient data in a portion of the Fluo table.  Transient
+data is data thats continually being added and deleted.  Also these transient
+data ranges contain no long term data.  The way Fluo works, when data is
+deleted a delete marker is inserted but the data is actually still there.  Over
+time these transient ranges of the table will have a lot more delete markers
+than actual data if nothing is done.  If nothing is done, then processing
+transient data will get increasingly slower over time.
+
+These deleted markers can be cleaned up by forcing Accumulo to compact the
+Fluo table, which will run Fluos garbage collection iterator. However,
+compacting the entire table to clean up these ranges within a table is
+overkill. Alternatively,  Accumulo supports compacting ranges of a table.   So
+a good solution to the delete marker problem is to periodically compact just
+the transient ranges. 
+
+Fluo Recipes provides helper code to deal with transient data ranges in a
+standard way.
+
+## Registering Transient Ranges
+
+Recipes like [Export Queue][export-queue] will automatically register
+transient ranges when configured.  If you would like to register your own
+transient ranges, use [TransientRegistry].  Below is a simple example of
+using this.
+
+```java
+FluoConfiguration fluoConfig = ...;
+TransientRegistry transientRegistry = new 
TransientRegistry(fluoConfig.getAppConfiguration());
+transientRegistry.addTransientRange(new RowRange(startRow, endRow));
+
+//Initialize Fluo using fluoConfig. This will store the registered ranges in
+//zookeeper making them availiable on any node later.
+```
+
+## Compacting Transient Ranges
+
+Although you may never need to register transient ranges directly, you will
+need to periodically compact transient ranges if using a recipe that registers
+them.  Using [TableOperations] this can be done with one line of Java code
+like the following.
+
+```java
+FluoConfiguration fluoConfig = ...;
+TableOperations.compactTransient(fluoConfig);
+```
+
+Fluo recipes provides an easy way to compact transient ranges from the command 
line using the `fluo exec` command as follows:
+
+```
+fluo exec <app name> org.apache.fluo.recipes.accumulo.cmds.CompactTransient 
[<interval> [<multiplier>]]
+```
+
+If no arguments are specified the command will call `compactTransient()` once.
+If `<interval>` is specified the command will run forever compacting transient
+ranges sleeping `<interval>` seconds between compacting each transient ranges.
+
+In the case where Fluo is backed up in processing data a transient range could
+have a lot of data queued and compacting it too frequently would be
+counterproductive.  To avoid this the `CompactTransient` command will consider
+the time it took to compact a range when deciding when to compact that range
+next.  This is where the `<multiplier>` argument comes in, the time to sleep
+between compactions of a range is determined as follows.  If not specified, the
+multiplier defaults to 3.
+
+```java
+sleepTime = Math.max(compactTime * multiplier, interval);
+```
+
+For example assume a Fluo application has two transient ranges.  Also assume
+CompactTransient is run with an interval of 600 and a multiplier of 10.  If the
+first range takes 20 seconds to compact, then it will be compacted again in 600
+seconds.  If the second range takes 80 seconds to compact, then it will be
+compacted again in 800 seconds.
+
+[TransientRegistry]: {{ page.javadoc_core 
}}/org/apache/fluo/recipes/core/common/TransientRegistry.html
+[TableOperations]: {{ page.javadoc_accumulo 
}}/org/apache/fluo/recipes/accumulo/ops/TableOperations.html
+[export-queue]: {{ page.docs_base }}/recipes/export-queue/

-- 
To stop receiving notification emails like this one, please contact
['"[email protected]" <[email protected]>'].

[fluo-website] branch gh-pages updated: Move Fluo Recipes documentation to website (#92)

Reply via email to