Here is Google Suffix array testing result website. https://sites.google.com/site/yuta256/sais
I want to know if the testing corpus contains DNA bio information? It has a file named chr22.dna. Is it chromosome 22 DNA? Weng On Thursday, August 7, 2014 6:26:19 PM UTC-7, [email protected] wrote: > > I am developing a new algorithm constructing Suffix Array that is not > based on KA, AS-IS or Skew algorithms. Its performance depends on Max(LCPs) > (the largest of longest common prefix) of the suffix array. It will work > perfectly for 8-bit character string without any code change. It needs some > refine to deal with genome code. > > I want to know some special knowledge about genome DNA testing code. I > know nothing about DNA sequence and biology. > > 1. Which are the best books about genome DNA sequence processing suitable > for me who is developing a new algorithm constructing suffix array and want > the algorithm better workable for DNA analyses. > 2. I want to know if there is any algorithm constructing Suffix Array > whose performance depends on Max(LCPs)? > 3. Genome DNA testing file contains only 4 characters: A,C,G and T. Is it > right? I found another char U in RNA. Does the file still contain 4 > characters? > 4. If the number of chars in a file is limited to 4, and all repeatable > patterns are known, I can specially design some technical refinement to > improve my algorithm performance. I want know, in addition to 1 char, 2 > chars, 3 chars and 4 chars repentance, 5 chars or > more repeatable sequence are common? And if common, the largest common > chars repentance contains how many different chars? > 1 char repentance: AAAAAAAA... > 2 char repentance: ACACACACACACACA... > > Thank you. > > Weng > > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
