This document describes the WWW BLAST interface.
     BLAST (Basic Local Alignment Search Tool) is  the  heuristic
     search  algorithm  employed  by the programs blastp, blastn,
     blastx, tblastn, and tblastx; these programs ascribe  signi-
     ficance  to  their findings using the statistical methods of
     Karlin and Altschul (1990, 1993) with  a  few  enhancements.
     The  BLAST  programs  were  tailored for sequence similarity
     searching -- for example to identify  homologs  to  a  query
     sequence.   The programs are not generally useful for motif-
     style searching.  For a discussion of basic issues in  simi-
     larity  searching of sequence databases, see Altschul et al.
     (1994).
     The five BLAST programs described here perform the following
     tasks:
     blastp    compares an amino acid query  sequence  against  a
               protein sequence database;
     blastn    compares a nucleotide  query  sequence  against  a
               nucleotide sequence database;
     blastx    compares  the  six-frame  conceptual   translation
               products  of  a  nucleotide  query  sequence (both
               strands) against a protein sequence database;
     tblastn   compares  a  protein  query  sequence  against   a
               nucleotide     sequence    database    dynamically
               translated  in  all  six  reading   frames   (both
               strands).
     tblastx   compares the six-frame translations of  a  nucleo-
               tide query sequence against the six-frame transla-
               tions of a nucleotide sequence database.
BLAST Search parameters
     The fundamental unit of BLAST algorithm output is the  High-
     scoring Segment Pair (HSP).  An HSP consists of two sequence
     fragments of arbitrary but equal length whose  alignment  is
     locally  maximal  and for which the alignment score meets or
     exceeds a threshold or cutoff score.  A set of HSPs is  thus
     defined  by  two  sequences,  a scoring system, and a cutoff
     score; this set may be empty if the cutoff score  is  suffi-
     ciently  high.   In  the programmatic implementations of the
     BLAST algorithm described here, each HSP consists of a  seg-
     ment  from  the  query  sequence  and  one  from  a database
     sequence.  The sensitivity and speed of the programs can  be
     adjusted  via  the standard BLAST algorithm parameters W, T,
     and X (Altschul et al., 1990); selectivity of  the  programs
     can be adjusted via the cutoff score.
     A Maximal-scoring Segment  Pair  (MSP)  is  defined  by  two
     sequences and a scoring system and is the highest-scoring of
     all possible segment pairs that can be produced from the two
     sequences.   The  statistical methods of Karlin and Altschul
     (1990, 1993) are applicable to determining the  significance
     of MSP scores in the limit of long sequences, under a random
     sequence model that assumes independent and identically dis-
     tributed  choices  for  the residues at each position in the
     sequences.  In the programs described here,  Karlin-Altschul
     statistics  have  been extrapolated to the task of assessing
     the significance of HSP scores obtained from comparisons  of
     potentially short, biological sequences.
     The approach to similarity searching taken by the BLAST pro-
     grams  is  first to look for similar segments (HSPs) between
     the query sequence and a database sequence, then to evaluate
     the statistical significance of any matches that were found,
     and finally to report only  those  matches  that  satisfy  a
     user-selectable threshold of significance.  Findings of mul-
     tiple HSPs involving the query sequence and a  single  data-
     base  sequence  may be treated statistically in a variety of
     ways.  By default the programs use "Sum" statistics  (Karlin
     and  Altschul, 1993).  As such, the statistical significance
     ascribed to a set of HSPs may be higher than  that  ascribed
     to any individual member of the set.  Only when the ascribed
     significance  satisfies  the  user-selectable  threshold  (E
     parameter) will the match be reported to the user.
     The task of finding HSPs begins with identifying short words
     of  length  W  in  the  query  sequence that either match or
     satisfy some positive-valued threshold score T when  aligned
     with a word of the same length in a database sequence.  T is
     referred  to  as  the  neighborhood  word  score   threshold
     (Altschul  et  al.,  1990).  These initial neighborhood word
     hits act as seeds for initiating  searches  to  find  longer
     HSPs  containing  them.   The word hits are extended in both
     directions along each sequence for as far as the  cumulative
     alignment  score  can  be  increased.  Extension of the word
     hits in each direction  are  halted  when:   the  cumulative
     alignment score falls off by the quantity X from its maximum
     achieved value; the cumulative score goes to zero or  below,
     due  to  the  accumulation  of  one or more negative-scoring
     residue  alignments;  or  the  end  of  either  sequence  is
     reached.
     From Karlin and  Altschul  (1990),  the  principal  equation
     relating  the  score  of an HSP to its expected frequency of
     chance occurrence is:
                        E = K N exp(-Lambda S)
     where E is the expected frequency of chance occurrence of an
     HSP having score S (or one scoring higher); K and Lambda are
     Karlin-Altschul parameters; N is the product  of  the  query
     and  database  sequence  lengths,  or the size of the search
     space; and exp is the exponentiation function.
     Lambda may be thought of as the expected increase in  relia-
     bility  of  an  alignment associated with a unit increase in
     alignment score.  Reliability in this case is  expressed  in
     units  of  information,  such  as bits or nats, with one nat
     being equivalent to 1/log(2) (roughly 1.44) bits.
     The expectation E (range 0 to infinity)  calculated  for  an
     alignment between the query sequence and a database sequence
     can be  extrapolated  to  an  expectation  over  the  entire
     database search, by converting the pairwise expectation to a
     probability (range 0-1) and multiplying the  result  by  the
     ratio of the entire database size (expressed in residues) to
     the length of the matching database sequence.  In detail:
                   E_database = (1 - exp(-E)) D / d
     where D is the size of the database; d is the length of  the
     matching  database  sequence; and the quantity (1 - exp(-E))
     is the probability, P, corresponding to  the  expectation  E
     for  the  pairwise  sequence  comparison.   Note that in the
     limit of infinite E, P approaches 1; and in the limit  as  E
     approaches  0, E and P approach equality.  Due to inaccuracy
     in the statistical methods as they are applied in the  BLAST
     programs, whenever E and P are less than about 0.05, the two
     values can be practically treated as being equal.
     In contrast to the random sequence  model  used  by  Karlin-
     Altschul statistics, biological sequences are often short in
     length -- an HSP may involve a relatively large fraction  of
     the  query or database sequence, which reduces the effective
     size of the 2-dimensional search space defined  by  the  two
     sequences.   To obtain more accurate significance estimates,
     the BLAST programs compute effective lengths for  the  query
     and database sequences that are their real lengths minus the
     expected length of the HSP, where the expected length for an
     HSP is computed from its score.  In no event is an effective
     length for the query or database sequence  permitted  to  go
     below  1.  Thus, the effective length of either the query or
     the database sequence is computed according to  the  follow-
     ing:
          Length_eff = MAX( Length_real - Lambda S / H , 1)
     where H is the relative entropy of the target and background
     residue  frequencies (Karlin and Altschul, 1990), one of the
     statistics reported by the BLAST programs.  H may be thought
     of as the information expected to be obtained from each pair
     of aligned residues in a real alignment  that  distinguishes
     the alignment from a random one.
     The default scoring matrix used by blastp, blastx,  tblastn,
     and  tblastx  is the BLOSUM62 matrix (Henikoff and Henikoff,
     1992).  
     Several PAM (point  accepted  mutations  per  100  residues)
     amino  acid  scoring  matrices  are  provided  in  the BLAST
     software distribution,  including  the  PAM40,  PAM120,  and
     PAM250.  While the BLOSUM62 matrix is a good general purpose
     scoring matrix and is the default matrix used by  the  BLAST
     programs,  if  one  is  restricted to using only PAM scoring
     matrices, then the PAM120 is recommended for general protein
     similarity  searches  (Altschul,  1991).  The pam(1) program
     can be used to produce PAM matrices of any desired iteration
     from  2  to  511.   Each matrix is most sensitive at finding
     similarities at  its  particular  PAM  distance.   For  more
     thorough searches, particularly when the mutational distance
     between potential homologs is unknown and  the  significance
     of  their  similarity  may be only marginal, Altschul (1991,
     1992) recommends performing at  least  three  searches,  one
     each with the PAM40, PAM120 and PAM250 matrices.
     In blastn, the M parameter sets the reward score for a  pair
     of matching residues; the N parameter sets the penalty score
     for mismatching residues.  M and  N  must  be  positive  and
     negative integers, respectively.  The relative magnitudes of
     M and N determines the number of nucleic  acid  PAMs  (point
     accepted mutations per 100 residues) for which they are most
     sensitive  at  finding  homologs.   Higher  ratios  of   M:N
     correspond to increasing nucleic acid PAMs (increased diver-
     gence).  The default values for M and N, respectively 5  and
     -4,  having  a ratio of 1.25, correspond to about 47 nucleic
     acid PAMs, or about 58 amino acid PAMs; an M:N  ratio  of  1
     corresponds  to  30 nucleic acid PAMs or 38 amino acid PAMs.
     At higher than about 40 nucleic acid PAMs, or 50 amino  acid
     PAMs,  better  sensitivity at detecting similarities between
     coding regions is expected by performing comparisons at  the
     amino  acid  level (States et al., 1991), using conceptually
     translated nucleotide sequences (re:  blastx,  tblastn,  and
     tblastx).
     Independent of the values chosen for M and  N,  the  default
     wordlength  W=11  used  by  blastn  restricts the program to
     finding sequences that share at least an 11-mer  stretch  of
     100%  identity  with  the  query.  Under the random sequence
     model, stretches of 11  consecutive  matching  residues  are
     unlikely  to  occur  merely  by  chance  even  between  only
     moderately diverged homologs.  Thus, blastn with its default
     parameter  settings is poorly suited to finding anything but
     very similar sequences.  If better  sensitivity  is  needed,
     one should use a smaller value for W.
     For the blastn program, it may be easy to see how  multiply-
     ing both M and N by some large number will yield proportion-
     ally larger alignment scores with their statistical signifi-
     cance  remaining  unchanged.  This scale-independence of the
     statistical significance estimates from blastn has its  ana-
     log  in  the  scoring  matrices used by the other BLAST pro-
     grams:  multiplying all elements in a scoring matrix  by  an
     arbitrary  factor  will  proportionally  alter the alignment
     scores but will not  alter  their  statistical  significance
     (assuming  numerical precision is maintained).  From this it
     should be clear that raw alignment  scores  are  meaningless
     without  specific  knowledge  of the scoring matrix that was
     used.
     Regardless of the scoring  scheme  employed,  two  stringent
     criteria  must  be  met in order to be able to calculate the
     Karlin-Altschul parameters Lambda and K.  First,  given  the
     residue  composition  for the query sequence and the residue
     composition assumed for the database,  the  alignment  score
     expected  for  any  randomly  selected pair of residues (one
     from the query sequence and one from the database)  must  be
     negative.   Second,  given the sequence residue compositions
     and the scoring scheme, a positive score must be possible to
     achieve.   For  instance,  the  match reward score of blastn
     must have a positive value; and given the assumption made by
     blastn  that the 4 nucleotides A, C, G and T are represented
     at equal 25% frequencies in the database, a  wide  range  of
     value  combinations  for  M  and N are precluded from use --
     namely those combinations where the magnitude of  the  ratio
     M:N is greater than or equal to 3.
     The parameter C can be set to a positive integer  to  select
     the  genetic code that will be used by blastx and tblastx to
     translate the query sequence.  The -dbgcode parameter can be
     used  to select an alternate genetic code for translation of
     the database by the programs tblastn and tblastx.   In  each
     case,  the  default genetic code is the so-called "Standard"
     or "Universal" genetic code.  To obtain  a  listing  of  the
     genetic codes available and their associated numerical iden-
     tifiers, invoke blastx or  tblastx  with  the  command  line
     parameter C=list. Note:  the numerical identifiers used here
     for  genetic  codes  parallel  those  defined  in  the  NCBI
     software  Toolbox;  hence  some  numerical  values  will  be
     skipped as genetic codes are updated.
     The list of genetic codes  available  and  their  associated
     values for the parameters C and -dbgcode are:
     1 Standard or Universal
     2 Vertebrate Mitochondrial
     3 Yeast Mitochondrial
     4   Mold,   Protozoan,   Coelenterate   Mitochondrial    and
     Mycoplasma/Spiroplasma
     5 Invertebrate Mitochondrial
     6 Ciliate Macronuclear
     9 Echinodermate Mitochondrial
     10 Alternative Ciliate Macronuclear
     11 Eubacterial
     12 Alternative Yeast
     13 Ascidian Mitochondrial
     14 Flatworm Mitochondrial
     The Expect and P-values reported for HSPs are  dependent  on
     several factors including:  the scoring system employed, the
     residue composition of the query sequence, an assumed  resi-
     due  composition for a typical database sequence, the length
     of the query sequence, and the total length of the database.
     HSP  scores from different program invocations are appropri-
     ate for comparison even if the  databases  searched  are  of
     different  lengths,  as  long as the other factors mentioned
     here do  not  vary.   For  example,  alignment  scores  from
     searches  with  the  default  BLOSUM62  matrix should not be
     directly compared  with  scores  obtained  with  the  PAM120
     matrix;  and  scores produced using two versions of the same
     PAM matrix, each created to different  scales  (see  above),
     can  not  be meaningfully compared without conversion to the
     same scale.
     Some isolation from the many factors involved  in  assessing
     the  statistical  significance  of  HSPs  can be attained by
     observing the information content reported (in bits) for the
     alignments.   While  the  information  content of an HSP may
     change when different scoring systems are used  (e.g.,  with
     different  PAM matrices), the number of bits reported for an
     HSP will at least be independent of the scale to  which  the
     scoring  matrix was generated.  (In practice, this statement
     is not quite true, because the alignment scores used by  the
     BLAST  programs  are integers that lack much precision).  In
     other words, when conveying the statistical significance  of
     an  alignment,  the  alignment  score  itself  is not useful
     unless the specific scoring matrix that was employed is also
     provided, but the informativeness of an alignment is a mean-
     ingful statistic that can be  used  to  ascribe  statistical
     significance  (a  P-value)  to  the  match  independently of
     specific knowledge about the scoring matrix.
     The BLAST programs all provide information  in  roughly  the
     same  format.   First  comes (A) an introduction to the pro-
     gram; (B) a histogram of expectations (see above) if one was
     requested; (C) a series of one-line descriptions of matching
     database sequences; (D) the actual sequence alignments;  and
     finally  the parameters and other statistics gathered during
     the search.
     Sample blastp output from comparing pir|A01243|DXCH  against
     the SWISS-PROT database is presented below.
  A. Program Introduction
     The introductory output provides the program name (BLASTP in
     this  case),  the version number (1.4.6MP in this case), the
     date the program  source  code  last  changed  substantially
     (June  13,  1994), the date the program was built (Sept. 22,
     1994), and a description of the query sequence and  database
     to be searched.  These may all be important pieces of infor-
     mation if a  bug  is  suspected  or  if  reproducibility  of
     results is important.
     The "Searching..." indicator  indicates  progress  that  the
     program made in searching the database.  A complete database
     search will yield 50 periods (.), or one period per database
     sequence,  whichever  number  is  smaller.  When searching a
     database consisting of 50 sequences or more, if  fewer  than
     50  periods  are  displayed and the program aborted for some
     reason, dividing the number of periods by 0.5 will yield the
     approximate  percentage  (0-100%)  of  the database that was
     searched before the program died.  If the program had diffi-
     culty  making  progress  through  the  database, one or more
     asterisks (*) may be interspersed  between  the  periods  at
     one-minute intervals.
  B. Histogram of Expectations
     Shown in the output below is a histogram of the lowest (most
     significant)  Expect  values  obtained  with  each  database
     sequence.  This information is  useful  in  determining  the
     numbers  of  database  sequences  that achieved a particular
     level of statistical significance.  It indicates the  number
     of database matches that would be reportable at various set-
     tings for the expectation threshold (E parameter).
  C. One-line Summaries
     The one-line sequence descriptions and summaries of  results
     are useful for identifying biologically interesting database
     matches and correlating this interest with  the  statistical
     significance  estimates.   Unless  otherwise  requested, the
     database sequences are sorted by increasing P-value  (proba-
     bility).   Identifiers  for the database sequences appear in
     the first column;  then  come  brief  descriptions  of  each
     sequence,  which may need to be truncated in order to fit in
     the available space.  The "High Score" column  contains  the
     score  of  the  highest-scoring HSP found with each database
     sequence.  The "P(N)" column  contains  the  lowest  P-value
     ascribed  to any set of HSPs for each database sequence; and
     the "N" column displays the number of HSPs in the set  which
     was  ascribed  the lowest P-value.  The P-values are a func-
     tion of N, as used in Karlin-Altschul  "Sum"  statistics  or
     Poisson  statistics, to treat situations where multiple HSPs
     are found.  It should be noted that the highest-scoring  HSP
     whose  score  is  reported in the "High Score" column is not
     necessarily a member of the set of  HSPs  which  yields  the
     lowest P-value; the highest-scoring HSP may be excluded from
     this set on the basis of  consistency  rules  governing  the
     grouping  of HSPs (see the -consistency option).  Numbers of
     the form "7.7e-160" are in  scientific  notation.   In  this
     particular  example,  the  number  being  represented is 7.7
     times 10 to the minus 160th power.  which is  astronomically
     close to zero.
  D. Alignments
     Alignments found with  the  BLAST  algorithm  are  ungapped.
     Several  statistics  are used to describe each HSP:  the raw
     alignment Score; the raw score converted to bits of informa-
     tion  by  multiplying by Lambda (see the Statistics output);
     the number of times one might Expect to see such a match (or
     a  better one) merely by chance; the P-value (probability in
     the range 0-1) of observing such a  match;  the  number  and
     fraction  of  total residues in the HSP which are identical;
     the number and fraction of residues for which the  alignment
     scores  have positive values.  When Sum statistics have been
     used to calculate the Expect and P-values,  the  P-value  is
     qualified  with  the  word "Sum" and the N parameter used in
     the Sum statistics is provided in  parentheses  to  indicate
     the  number of HSPs in the set; when Poisson statistics have
     been used to calculate the Expect and P-values, the  P-value
     is qualified with the word "Poisson".  Between the two lines
     of Query and Subject (database) sequence is a line  indicat-
     ing  the  specific  residues which are identical, as well as
     those which are non-identical but nevertheless have positive
     alignment scores defined in the scoring matrix that was used
     (the BLOSUM62 matrix in this case).   Identical  letters  or
     residues,  when  paired with each other, are not highlighted
     if their alignment score is negative or zero.   Examples  of
     this  would  be  an X juxtaposed with an X in two amino acid
     sequences, or an N juxtaposed with another N in two  nucleo-
     tide sequences.  Such ambiguous residue-residue pairings may
     be uninformative and thus lend no  support  to  the  overall
     alignment being either real or random; however, the informa-
     tiveness of these pairings is left up to  the  user  of  the
     BLAST  programs to decide, because any values desired can be
     specified in a scoring matrix of the user's own making.
     BLASTP 1.4.6MP [13-Jun-94] [Build 13:58:36 Sep 22 1994]
     Reference:  Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
     and David J. Lipman (1990).  Basic local alignment search tool.  J. Mol. Biol.
     215:403-10.
     Query =  pir|A01243|DXCH  232 Gene X protein - Chicken (fragment)
             (232 letters)
     Database:  SWISS-PROT Release 29.0
                38,303 sequences; 13,464,008 total letters.
     Searching..................................................done
          Observed Numbers of Database Sequences Satisfying
          Various EXPECTation Thresholds (E parameter values)
             Histogram units:      = 31 Sequences     : less than 31 sequences
      EXPECTation Threshold
      (E parameter)
         |
         V   Observed Counts-->
       10000 4863 1861 |============================================================
        6310 3002  782 |=========================
        3980 2220  812 |==========================
        2510 1408  303 |=========
        1580 1105  393 |============
        1000  712  179 |=====
         631  533  161 |=====
         398  372   80 |==
         251  292   73 |==
         158  219   50 |=
         100  169   32 |=
        63.1  137   18 |:
        39.8  119    9 |:
        25.1  110    6 |:
        15.8  104    9 |:
      >>>>>>>>>>>>>>>>>>>>>  Expect = 10.0, Observed = 95  <<<<<<<<<<<<<<<<<
        10.0   95    4 |:
        6.31   91    3 |:
        3.98   88    1 |:
        2.51   87    3 |:
        1.58   84    0 |
        1.00   84    2 |:
                                                                          Smallest
                                                                            Sum
                                                                   High  Probability
     Sequences producing High-scoring Segment Pairs:              Score  P(N)      N
     sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED) (...  1191  7.7e-160  1
     sp|P01014|OVAY_CHICK GENE Y PROTEIN (OVALBUMIN-RELATED).       949  7.0e-127  1
     sp|P01012|OVAL_CHICK OVALBUMIN (PLAKALBUMIN).                  645  3.4e-100  2
     sp|P19104|OVAL_COTJA OVALBUMIN.                                626  1.2e-96   2
     sp|P05619|ILEU_HORSE LEUKOCYTE ELASTASE INHIBITOR (LEI).       216  3.7e-71   3
     sp|P80229|ILEU_PIG   LEUKOCYTE ELASTASE INHIBITOR (LEI) (...   325  4.0e-71   2
     sp|P29508|SCCA_HUMAN SQUAMOUS CELL CARCINOMA ANTIGEN (SCC...   439  3.5e-70   2
     sp|P30740|ILEU_HUMAN LEUKOCYTE ELASTASE INHIBITOR (LEI) (...   211  1.3e-66   3
     sp|P05120|PAI2_HUMAN PLASMINOGEN ACTIVATOR INHIBITOR-2, P...   176  1.8e-65   4
     sp|P35237|PTI_HUMAN  PLACENTAL THROMBIN INHIBITOR.             473  1.3e-61   1
     sp|P29524|PAI2_RAT   PLASMINOGEN ACTIVATOR INHIBITOR-2, T...   183  9.4e-61   4
     sp|P12388|PAI2_MOUSE PLASMINOGEN ACTIVATOR INHIBITOR-2, M...   179  1.8e-60   4
     sp|P36952|MASP_HUMAN MASPIN PRECURSOR.                         198  2.6e-58   4
     sp|P32261|ANT3_MOUSE ANTITHROMBIN-III PRECURSOR (ATIII).       142  4.0e-48   5
     sp|P01008|ANT3_HUMAN ANTITHROMBIN-III PRECURSOR (ATIII).       122  7.5e-48   5
     WARNING:  Descriptions of 80 database sequences were not reported due to the
               limiting value of parameter V = 15.
       ... alignments with the top 8 database sequences deleted ...
     >sp|P05120|PAI2_HUMAN PLASMINOGEN ACTIVATOR INHIBITOR-2, PLACENTAL (PAI-2)
                 (MONOCYTE ARG- SERPIN).
                 Length = 415
      Score = 176 (80.2 bits), Expect = 1.8e-65, Sum P(4) = 1.8e-65
      Identities = 38/89 (42%), Positives = 50/89 (56%)
     Query:     1 QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNN 60
                  +I +LL   S D DT +VLVNA+YFKG WKT F  +     PF V   +  PVQMM +
     Sbjct:   180 KIPNLLPEGSVDGDTRMVLVNAVYFKGKWKTPFEKKLNGLYPFRVNSAQRTPVQMMYLRE 239
     Query:    61 SFNVATLPAEKMKILELPFASGDLSMLVL 89
                    N+  +   K +ILELP+A      L+L
     Sbjct:   240 KLNIGYIEDLKAQILELPYAGDVSMFLLL 268
      Score = 165 (75.2 bits), Expect = 1.8e-65, Sum P(4) = 1.8e-65
      Identities = 33/78 (42%), Positives = 47/78 (60%)
     Query:   155 ANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHPFLFL 214
                  AN +G+S    L +S+  H A ++++E+G E A  TG +   +      QF ADHPFLFL
     Sbjct:   338 ANFSGMSERNDLFLSEVFHQAMVDVNEEGTEAAAGTGGVMTGRTGHGGPQFVADHPFLFL 397
     Query:   215 IKHNPTNTIVYFGRYWSP 232
                  I H  T  I++FGR+ SP
     Sbjct:   398 IMHKITKCILFFGRFCSP 415
      Score = 144 (65.6 bits), Expect = 1.8e-65, Sum P(4) = 1.8e-65
      Identities = 26/62 (41%), Positives = 41/62 (66%)
     Query:    90 LPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTD 149
                  + D  + LE +E  I ++KL +WT+ + M +  V+VY+PQ K+EE Y L S+L ++GM D
     Sbjct:   272 IADVSTGLELLESEITYDKLNKWTSKDKMAEDEVEVYIPQFKLEEHYELRSILRSMGMED 331
     Query:   150 LF 151
                   F
     Sbjct:   332 AF 333
      Score = 61 (27.8 bits), Expect = 1.8e-65, Sum P(4) = 1.8e-65
      Identities = 10/17 (58%), Positives = 16/17 (94%)
     Query:    81 SGDLSMLVLLPDEVSDL 97
                  +GD+SM +LLPDE++D+
     Sbjct:   259 AGDVSMFLLLPDEIADV 275
     WARNING:  HSPs involving 86 database sequences were not reported due to the
               limiting value of parameter B = 9.
     Parameters:
       V=15
       B=9
       H=1
       -ctxfactor=1.00
       E=10
       Query                        -----  As Used  -----    -----  Computed  ----
       Frame  MatID Matrix name     Lambda    K       H      Lambda    K       H
        +0      0   BLOSUM62        0.316   0.132   0.370    same    same    same
       Query
       Frame  MatID  Length  Eff.Length   E    S W   T  X     E2  S2
        +0      0      232       232      10. 57 3  11 22    0.22 33
     Statistics:
       Query          Expected         Observed           HSPs       HSPs
       Frame  MatID  High Score       High Score       Reportable  Reported
        +0      0    62 (28.2 bits)  1191 (542.5 bits)     330         24
       Query         Neighborhd  Word      Excluded    Failed   Successful  Overlaps
       Frame  MatID   Words      Hits        Hits    Extensions Extensions  Excluded
        +0      0      4988     5661199     1146395     4504598    10187        13
       Database:  SWISS-PROT Release 29.0
         Release date:  June 1994
         Posted date:  1:29 PM EDT Jul 28, 1994
       # of letters in database:  13,464,008
       # of sequences in database:  38,303
       # of database sequences satisfying E:  95
       No. of states in DFA:  561 (55 KB)
       Total size of DFA:  110 KB (128 KB)
       Time to generate neighborhood:  0.03u 0.01s 0.04t  Real: 00:00:00
       No. of processors used:  8
       Time to search database:  32.27u 0.78s 33.05t  Real: 00:00:04
       Total cpu time:  32.33u 0.91s 33.24t  Real: 00:00:05
     WARNINGS ISSUED:  2
     This work is in the public domain.
     Altschul,  Stephen  F.  (1991).   Amino  acid   substitution
     matrices  from an information theoretic perspective. J. Mol.
     Biol.  219:555-65.
     Altschul, S. F. (1993).  A protein alignment scoring  system
     sensitive  at  all  evolutionary  distances.  J.  Mol. Evol.
     36:290-300.
     Altschul, S. F., M. S. Boguski, W. Gish and  J.  C.  Wootton
     (1994).   Issues  in searching molecular sequence databases.
     Nature Genetics 6:119-129.
     Altschul, Stephen F., Warren Gish, Webb  Miller,  Eugene  W.
     Myers,  and  David  J. Lipman (1990).  Basic local alignment
     search tool. J. Mol. Biol.  215:403-10.
     Claverie,  J.-M.  and  D.  J.  States  (1993).   Information
     enhancement  methods for large scale sequence analysis. Com-
     puters in Chemistry 17:191-201.
     Gish, W. and D. J. States (1993).  Identification of protein
     coding  regions by database similarity search. Nature Genet-
     ics 3:266-72.
     Henikoff, Steven and Jorga G. Henikoff (1992).   Amino  acid
     substitution matrices from protein blocks. Proc. Natl. Acad.
     Sci. USA 89:10915-19.
     Karlin, Samuel and Stephen F. Altschul (1990).  Methods  for
     assessing the statistical significance of molecular sequence
     features by using general scoring schemes. Proc. Natl. Acad.
     Sci. USA 87:2264-68.
     Karlin, Samuel and Stephen F. Altschul (1993).  Applications
     and statistics for multiple high-scoring segments in molecu-
     lar sequences. Proc. Natl. Acad. Sci. USA 90:5873-7.
     States, D. J. and W. Gish (1994).  Combined use of  sequence
     similarity  and codon bias for coding region identification.
     J. Comput. Biol.  1:39-50.
     States, D. J., W. Gish and S. F. Altschul (1991).   Improved
     sensitivity  of  nucleic  acid  database similarity searches
     using application specific scoring matrices. Methods: A com-
     panion to Methods in Enzymology 3:66-70.
     Wootton, J. C. and S. Federhen (1993).  Statistics of  local
     complexity  in  amino acid sequences and sequence databases.
     Computers in Chemistry 17:149-163.