I am developing a new algorithm constructing Suffix Array that is not based on KA, AS-IS or Skew algorithms. Its performance depends on Max(LCPs) (the largest of longest common prefix) of the suffix array. It will work perfectly for 8-bit character string without any code change. It needs some refine to deal with genome code.
I want to know some special knowledge about genome DNA testing code. I know nothing about DNA sequence and biology. 1. Which are the best books about genome DNA sequence processing suitable for me who is developing a new algorithm constructing suffix array and want the algorithm better workable for DNA analyses. 2. I want to know if there is any algorithm constructing Suffix Array whose performance depends on Max(LCPs)? 3. Genome DNA testing file contains only 4 characters: A,C,G and T. Is it right? I found another char U in RNA. Does the file still contain 4 characters? 4. If the number of chars in a file is limited to 4, and all repeatable patterns are known, I can specially design some technical refinement to improve my algorithm performance. I want know, in addition to 1 char, 2 chars, 3 chars and 4 chars repentance, 5 chars or more repeatable sequence are common? And if common, the largest common chars repentance contains how many different chars? 1 char repentance: AAAAAAAA... 2 char repentance: ACACACACACACACA... Thank you. Weng -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
